Mafiree logo
  • About
  • Services
  • Blogs
  • Careers
  • Products
    • orbit logo Orbit
    • streamer logo Xstreami
  • Contact
Schedule a Call
Menu
  • About
  • Services
  • Blogs
  • Careers
  • Products
    • orbit logo Orbit
    • streamer logo Xstreami
  • Contact
  • Schedule a Call
Database
Database Database Managed Services
MySQL MySQL
MySQL Consulting Services
MySQL Migration Services
MySQL Optimization & Query Tuning
MySQL Database Administration
MySQL Backup & Recovery
MySQL Security & Maintenance
MySQL Cloud Services (AWS RDS, Aurora, Google Cloud SQL, Azure)
MySQL for Ecommerce
MySQL High Availability & Replication
MongoDB MongoDB
MongoDB Consulting Services
MongoDB Migration Services
MongoDB Optimization & Query Tuning
MongoDB Database Administration
MongoDB Backup & Recovery
MongoDB Security & Maintenance
MongoDB Cloud (Atlas)
MongoDB Solutions by Industry
MongoDB High Availability & Replication
PostgreSQL PostgreSQL
PostgreSQL Consulting
PostgreSQL Migration & Upgrades
Performance Tuning & Query Optimization
PostgreSQL Administration & Managed Services
High Availability, Clustering & Replication
PostgreSQL Backup, Recovery & Disaster Planning
PostgreSQL Security, Compliance & Auditing
PostgreSQL for Analytics & Data Warehousing
PostgreSQL on Cloud & Containers
PostgreSQL Extensions & Open-Source Integrations
PostgreSQL for Every Industry
SQL Server MSSQL
MSSQL Consulting Services
MSSQL Migration Services
MSSQL Optimization & Query Tuning Services
MSSQL Database Administration Services
MSSQL Backup & Recovery Services
MSSQL High Availability & Replication Services
MSSQL Security & Compliance Services
MSSQL Performance Monitoring & Health Checks
MSSQL Solutions by Industry
Aerospike Aerospike
Aerospike Consulting Services
Aerospike Migration Services
Aerospike Performance Optimization & Tuning
Aerospike Database Administration
Aerospike Backup & Recovery
Aerospike High Availability
Aerospike Cloud & Hybrid Deployments
Aerospike for Real-Time Applications (AdTech, FinTech, Retail, IoT)
Analytics DB
Analytics DB Analytics DB Services
Clickhouse Clickhouse
ClickHouse Consulting Services
ClickHouse Migration Services
ClickHouse Optimization & Query Tuning
ClickHouse Database Administration
ClickHouse Backup & Recovery
ClickHouse Security & Maintenance
ClickHouse Cloud Services (ClickHouse Cloud, AWS, GCP, Azure)
ClickHouse Solutions by Industry
ClickHouse High Availability & Replication
TiDB TiDB
TiDB Consulting & Architecture Planning
TiDB Administration & Maintenance
TiDB Security and Privacy Maintenance
TiDB Performance & Query Optimization
TiDB Migration Services
TiDB Backup & Disaster Recovery
TiDB High Availability Solutions
TiDB Solutions by Industry
TiDB Cloud Services
ScyllaDB ScyllaDB
ScyllaDB Consulting & Architecture Planning
ScyllaDB Administration & Maintenance
ScyllaDB Security and Privacy Maintenance
ScyllaDB Performance & Query Optimization
ScyllaDB Migration Services
ScyllaDB Backup & Disaster Recovery
ScyllaDB High Availability Solutions
ScyllaDB Solutions by Industry
ScyllaDB Cloud Services
DevOps
DevOps DevOps Services
Version Control Version Control
Kubernetes Kubernetes
Infrastructure Infrastructure Management
Web Servers Web Servers
Networking
Networking Networking Services
Basic Basic
Advanced Advanced
MySQL MySQL
MongoDB MongoDB
PostgreSQL PostgreSQL
MSSQL MSSQL
Aerospike Aerospike
Clickhouse Clickhouse
TiDB TiDB
ScyllaDB ScyllaDB
Version Control Version Control
Kubernetes Kubernetes
Infrastructure Infrastructure Management
Web Servers Web Servers
Basic Basic
Advanced Advanced
MySQL Consulting Services
MySQL Migration Services
MySQL Optimization & Query Tuning
MySQL Database Administration
MySQL Backup & Recovery
MySQL Security & Maintenance
MySQL Cloud Services (AWS RDS, Aurora, Google Cloud SQL, Azure)
MySQL for Ecommerce
MySQL High Availability & Replication
MongoDB Consulting Services
MongoDB Migration Services
MongoDB Optimization & Query Tuning
MongoDB Database Administration
MongoDB Backup & Recovery
MongoDB Security & Maintenance
MongoDB Cloud (Atlas)
MongoDB Solutions by Industry
MongoDB High Availability & Replication
PostgreSQL Consulting
PostgreSQL Migration & Upgrades
Performance Tuning & Query Optimization
PostgreSQL Administration & Managed Services
High Availability, Clustering & Replication
PostgreSQL Backup, Recovery & Disaster Planning
PostgreSQL Security, Compliance & Auditing
PostgreSQL for Analytics & Data Warehousing
PostgreSQL on Cloud & Containers
PostgreSQL Extensions & Open-Source Integrations
PostgreSQL for Every Industry
MSSQL Consulting Services
MSSQL Migration Services
MSSQL Optimization & Query Tuning Services
MSSQL Database Administration Services
MSSQL Backup & Recovery Services
MSSQL High Availability & Replication Services
MSSQL Security & Compliance Services
MSSQL Performance Monitoring & Health Checks
MSSQL Solutions by Industry
Aerospike Consulting Services
Aerospike Migration Services
Aerospike Performance Optimization & Tuning
Aerospike Database Administration
Aerospike Backup & Recovery
Aerospike High Availability
Aerospike Cloud & Hybrid Deployments
Aerospike for Real-Time Applications (AdTech, FinTech, Retail, IoT)
ClickHouse Consulting Services
ClickHouse Migration Services
ClickHouse Optimization & Query Tuning
ClickHouse Database Administration
ClickHouse Backup & Recovery
ClickHouse Security & Maintenance
ClickHouse Cloud Services (ClickHouse Cloud, AWS, GCP, Azure)
ClickHouse Solutions by Industry
ClickHouse High Availability & Replication
TiDB Consulting & Architecture Planning
TiDB Administration & Maintenance
TiDB Security and Privacy Maintenance
TiDB Performance & Query Optimization
TiDB Migration Services
TiDB Backup & Disaster Recovery
TiDB High Availability Solutions
TiDB Solutions by Industry
TiDB Cloud Services
ScyllaDB Consulting & Architecture Planning
ScyllaDB Administration & Maintenance
ScyllaDB Security and Privacy Maintenance
ScyllaDB Performance & Query Optimization
ScyllaDB Migration Services
ScyllaDB Backup & Disaster Recovery
ScyllaDB High Availability Solutions
ScyllaDB Solutions by Industry
ScyllaDB Cloud Services
  1. Home
  2. > Blogs
  3. > TiDB
  4. > Manage Large Data Import in TiDB

Manage Large Data Import in TiDB

Efficiently import massive datasets into TiDB clusters using TiDB Lightning for faster analytics and scalable performance.

Abishek S October 03, 2025

Subscribe for email updates

When managing large-scale databases, efficient and reliable data import is essential especially during initial setup or migration from legacy systems. TiDB Lightning is a high-performance data import tool designed for the TiDB distributed SQL database.This guide covers everything you need to get started with TiDB Lightning.
 

What is TiDB Lightning?
TiDB Lightning is a powerful tool within the TiDB ecosystem designed to facilitate fast and efficient data import into TiDB clusters. It is particularly useful for large-scale data migration tasks, offering high performance and scalability. 
 

Purpose

The primary purpose of TiDB Lightning is to import data at a large scale, often used for initial data import into TiDB clusters. It is capable of processing data at a high speed, which is crucial for handling large datasets efficiently. The tool ensures data integrity through mechanisms like global checksums and metadata coordination, which are essential for maintaining data accuracy during the import process.

TiDB Lightning supports the following file formats:

  • Files exported by Dumpling.
  • CSV files.
  • Apache Parquet files generated by Amazon Aurora, Apache Hive, or Snowflake.

 

TiDB Lightning can read data from the following sources:

  • Local
  • Amazon S3
  • Google Cloud Storage

 

TiDB lightning import modes 

TiDB Lightning supports two import modes, configured by backend. The import mode determines the way data is imported into TiDB.

 

Physical Import Mode: 

TiDB Lightning first encodes data into key-value pairs and stores them in a local temporary directory, then uploads these key-value pairs to each TiKV node, and finally calls the TiKV Ingest interface to insert data into TiKV's RocksDB. If you need to perform initial import, consider the physical import mode, which has higher import speed. The backend for the physical import mode is local.

 

Logical Import Mode: 

TiDB Lightning first encodes the data into SQL statements and then runs these SQL statements directly for data import. If the cluster to be imported is in production, or if the target table to be imported already contains data, use the logical import mode. The backend for the logical import mode is tidb.

Import modePhysical Import ModeLogical Import Mode
Backendlocaltidb
SpeedFast (100~500 GiB/hour)Low (10~50 GiB/hour)
Resource consumptionHighLow
Network bandwidth consumptionHighLow
ACID compliance during importNoYes
Target tablesMust be emptyCan contain data
TiDB cluster version>= 4.0.0All
Whether the TiDB cluster can provide service during importLimited serviceYes

 

TiDB Lightning Architecture

 

Use TiDB Lightning

Prepare the source data

Use Dumpling to export the data from MySQL/TiDB.

 

Run TiDB Lightning

 

Set the Config

Create the configuration file tidb-lightning.toml based on your cluster information:

 

[lightning]
# Logging
level = "info"
file = “tidb-lightning.log”

[tikv-importer]
# Uses the Local-backend
backend = "local"
# Sets the directory for temporarily storing the sorted key-value pairs.
# The target directory must be empty.
sorted-kv-dir = “/data/tidb-tmp/”

[mydumper]
# Local source data directory
data-source-dir = “/data/dump/”

# Configures the wildcard rule. By default, all tables in the mysql, sys, INFORMATION_SCHEMA, PERFORMANCE_SCHEMA, METRICS_SCHEMA, and INSPECTION_SCHEMA system databases are filtered.
# If this item is not configured, the "cannot find schema" error occurs when system tables are imported.
filter = ['*.*', '!mysql.*', '!sys.*', '!INFORMATION_SCHEMA.*', '!PERFORMANCE_SCHEMA.*', '!METRICS_SCHEMA.*', '!INSPECTION_SCHEMA.*']
[tidb]
# Information of the target cluster
host = "172.17.0.6"
port = 4000
user = "root"
password = ""
# Table schema information is fetched from TiDB via this status-port.
status-port = 10080
# The PD address of the cluster
pd-addr = “172.17.0.5:2379”

 

Start the lightning

tiup tidb-lightning -config lightning.toml > lightning.out 

If the data was imported successfully it will show the output tidb lightning exit successfully

 

Limitations

Physical Import Mode Limitations

  • During the import process, the target TiDB cluster becomes unavailable for other operations. This mode is designed for initializing empty.
  • If your TiDB cluster has a latency-sensitive application and a low concurrency, it is strongly recommended that you do not use the physical import mode to import data into the cluster. This mode might have a significant impact on the online application.
  • No DDL or DML operations should be performed on the target tables during the import process, as this can lead to failures or data inconsistencies. Additionally, it is not recommended to perform read operations during the import.
  • The import process can significantly impact the performance of the cluster, and it is advised not to use multiple TiDB Lightning instances to import data to the same cluster unless using parallel import features.

 

Logical Import Mode Limitations
When using multiple TiDB Lightning instances, it is important not to mix the back ends (i.e., do not use both physical and logical import modes simultaneously on the same cluster).
 

Author Bio

Abishek S

Abishek S is a MongoDB and TiDB Certified DBA at Mafiree with strong expertise in distributed databases, TiDB architecture, and cross-database consistency tools. He writes technical content focused on practical database solutions, data consistency verification, replication strategies, and performance optimization for modern data platforms. His work helps engineers and DBAs improve reliability and efficiency in real-world database operations.

Leave a Comment

Related Blogs

Automate Expired Data Deletion in TiDB with TTL

Say goodbye to manual cleanup! Learn how TiDB’s TTL feature automates expired data deletion and keeps your database lean and efficient.

  1807 views
Sync-Diff-Inspector: Your Go-To Tool for Verifying Data Consistency Across TiDB and MySQL

A powerful tool to compare, detect, and fix data inconsistencies across distributed databases—ensure accurate migrations and seamless replication with Sync-Diff-Inspector.

  1646 views
The Feature You’ve Been Looking For – Speed Up Your Queries with TiDB Cached Tables!

Boost database performance with TiDB cached tables—reduce latency and accelerate read-heavy queries.

  1524 views
Introducing Foreign Keys and more: Exploring New Features in TiDB v8.5.0

Explore the latest features in TiDB v8.5.0, including support for Foreign Keys, improved performance, and enhanced security for modern database management.

  1804 views
Optimizing Data Storage with TiDB Placement Rules

Leveraging TiDB’s Placement Rules for Optimal Performance

  1653 views

Subscribe for email updates

Get in touch with us

Highlights

More than 6000 Servers Monitored

Happy Clients

Certified DBAs

24 x 7 x 365 Support

PCI

Database Services

MySQL MongoDB PostgreSQL SQL Server Aerospike Clickhouse TiDB MariaDB Columnstore

Quick Links

Careers Blog Contact Privacy Policy Disclaimer Policy

Contacts

Linkedin Mafiree Facebook Mafiree Twitter Mafiree

Nagercoil Office

Miru IT Park, Vallankumaranvillai,

Nagercoil, Tamilnadu - 629 002.

Bangalore Office

Unit 303, Vanguard Rise,

5th Main, Konena Agrahara,

Old Airport Road, Bangalore - 560 017.

Call: +91 6383016411

Email: sales@mafiree.com


Copyright © - All Rights Reserved - Mafiree