Ensuring data consistency during database migration and replication is a critical challenge, especially in distributed environments. Whether migrating from MySQL to TiDB or verifying data integrity between two TiDB clusters, maintaining accuracy is essential to prevent data loss or corruption.
This is where Sync-Diff-Inspector comes in. It is a powerful tool that helps compare and validate data across databases, detecting inconsistencies and generating SQL fixes to ensure data remains accurate.
In this guide, we’ll walk you through how to use Sync-Diff-Inspector for data validation, covering configuration details, real-world examples, and effective strategies to resolve inconsistencies.
Sync-diff-inspector
Sync-diff-inspector is a TiDB tool designed to compare and verify data consistency between two databases. It is particularly useful for ensuring data integrity after data migration processes, such as TiDB Data Migration (DM). The tool can identify the discrepancies between databases and generate SQL statements to repair the inconsistencies found.
Key features
Use Cases of Sync-Diff-Inspector in TiDB-DM
Database privileges for sync-diff-inspector
Upstream Database
Downstream Database
Data Validation using Sync-Diff-Inspector
Checking data counts in Source (MySQL) and Destination (TiDB)
MySQL
mysql> select count(*) from students;
+----------+
| count(*) |
+----------+
| 15|
+----------+
1 row in set (0.00 sec)
TiDB
mysql> select count(*) from test.students;
+----------+
| count(*) |
+----------+
| 14 |
+----------+
1 row in set (0.00 sec)The missing row in TiDB may be due to replication lag, sharding inconsistencies, or migration failures.
Sync-Diff-Inspector Config File Example
# Diff Configuration.
######################### Global config #########################
# The number of goroutines created to check data.
check-thread-count = 4
# If enabled, SQL statements is exported to fix inconsistent tables.
export-fix-sql = true
# Only compares the data instead of the table structure.
check-data-only = false
# Only compares the table structure instead of the data.
check-struct-only = false
# If enabled, sync-diff-inspector skips checking tables that do not exist in the upstream or downstream.
skip-non-existing-table = true
######################### Datasource config #########################
[data-sources]
[data-sources.mysql1] # mysql1 is the only custom ID for the database instance.
host = "127.0.0.1"
port = 3306
user = "root"
password = "*****"
[data-sources.tidb0]
host = "127.0.0.1"
port = 4000
user = "root"
password = “”
######################### task config #########################
# Configures the tables of the target database that need to be compared.
[task]
output-dir = "/home/tidb/output"
# The upstream database.
source-instances = ["mysql1"]
# The downstream database.
target-instance = “tidb0”
target-check-tables = ["test.students"]
Running the Validation
./sync_diff_inspector --config sync_diff.toml
A total of 1 tables need to be compared
Comparing the table structure of ``test`.`students`` ... equivalent
Comparing the table data of ``test`.`students`` ... failure
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
The data of `test`.`students` is not equal
The rest of tables are all equal.
A total of 1 tables have been compared, 0 tables finished, 1 tables failed, 0 tables skipped.
The patch file has been generated in
'/home/tidb/output/fix-on-tidb0/'
You can view the comparision details through ‘/home/tidb/output/sync_diff.log’
Analyzing the Comparison Results
After the check is finished, the sync-diff-inspector outputs a report file summary.txt located in output-dir = "/home/tidb/output".
Summary
Source Database
host = "127.0.0.1"
port = 3306
user = "root"
Target Databases
host = "127.0.0.1"
port = 4000
user = "root"
Comparison Result
The table structure and data in following tables are equivalent
The following tables contains inconsistent data
+-------------------+---------+--------------------+----------------+---------+-----------+
| TABLE | RESULT | STRUCTURE EQUALITY | DATA DIFF ROWS | UPCOUNT | DOWNCOUNT |
+-------------------+---------+--------------------+----------------+---------+-----------+
| `test`.`students` | succeed | true | +2/-1 | 15 | 14 |
+-------------------+---------+--------------------+----------------+---------+-----------+
Time Cost: 15.250426ms
Average Speed: 0.018823MB/s
Fixing Data Inconsistencies
If different rows exist during the data checking process, the SQL statements will be generated to fix them. The SQL file is located at ${output}/fix-on-${instance}, and ${instance} is the value of task.target-instance in the config.toml file.
-- table: test.students
-- range in sequence: Full
/*
DIFF COLUMNS ╏ `NAME`
╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╋╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍
source data ╏ 'Alice Smith'
╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╋╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍
target data ╏ 'Alicee Smith'
╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╋╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍
*/
REPLACE INTO `test`.`students`(`student_id`,`name`,`grade`) VALUES (101,'Alice Smith','A');
REPLACE INTO `test`.`students`(`student_id`,`name`,`grade`) VALUES (115,'Olivia Adams','A');
Best Practices for Using Sync-Diff-Inspector
Resource Management: Sync-Diff-Inspector consumes server resources while checking data. Avoid running it during peak business hours to prevent performance impact.
Collation Handling: When comparing MySQL and TiDB, ensure the collation settings are consistent, especially if primary or unique keys use VARCHAR. Mismatched collations can lead to incorrect validation results. Adjust collation settings in the configuration file as needed.
Data Chunking Accuracy: Sync-Diff-Inspector divides data into chunks based on TiDB statistics. Ensure accurate statistics by running ANALYZE TABLE {table_name} during low workload periods.
Table Mapping & Sharding: Pay close attention to table mapping rules. If a table in the source database has the same name as one in the target database, sharding may be applied by default. Properly configure schema-pattern and table-pattern to avoid unintended comparisons.
SQL Fix Verification: The generated SQL statements for data fixes serve as a reference. Always review and validate them before execution to prevent unintended modifications.
Restrictions
Conclusion
Sync-Diff-Inspector is your go-to tool for making sure data lines up perfectly across different databases, like MySQL and TiDB, or even between two TiDB setups. It spots any differences, checks if migrations went smoothly, and can even whip up SQL fixes, making data syncing rock solid for replication, migrations, and sharding.
To minimize risks, always review the generated SQL fixes before execution, monitor collation settings, and leverage TiDB statistics for improved accuracy. By integrating Sync-Diff-Inspector into your database management workflow, you can maintain data integrity and ensure smooth, reliable operations in your distributed database environment.
Miru IT Park, Vallankumaranvillai,
Nagercoil, Tamilnadu - 629 002.
Unit 303, Vanguard Rise,
5th Main, Konena Agrahara,
Old Airport Road, Bangalore - 560 017.
Call: +91 6383016411
Email: sales@mafiree.com