Schema changes on large MySQL tables can bring production systems to a halt if not handled correctly. This case study walks through how Mafiree helped a fintech client execute a zero-downtime MySQL schema migration on a 500M+ row production database — covering the real challenges faced, the three-phase tool strategy using gh-ost, pt-online-schema-change, and MySQL 8.0 INSTANT DDL, production configuration settings with performance benchmarks, and best practices for safely evolving your MySQL schema without impacting users
Jethish February 20, 2026
Every growing application eventually faces a defining moment: the database schema that served you well at launch no longer fits the evolving business logic. Adding a column, changing a data type, or creating an index sounds trivial in development, but on a production MySQL table with hundreds of millions of rows, a single ALTER TABLE can lock your application for hours and bring your service down.
This is the story of how Mafiree helped a fintech client migrate the schema of a 500M+ row MySQL table in production with zero downtime, zero data loss, and no impact on end-user experience. If you are looking for a battle-tested approach to MySQL schema migration, this guide breaks down the tools, trade-offs, and exact steps we followed.
MySQL’s default ALTER TABLE behavior rebuilds the entire table. For small tables, this completes in seconds. For tables with hundreds of millions of rows, the implications are severe:
Expert Tip: If your production MySQL tables exceed 10 million rows, never run a raw ALTER TABLE during business hours. Even MySQL 8.0’s Instant DDL has limitations that can catch teams off guard.
Our client, a fast-growing fintech company, operated a transaction processing system backed by MySQL 8.0 running on AWS RDS. Their core transactions table had grown to over 500 million rows and was serving 3,000+ queries per second at peak.
The business requirement was straightforward: add three new columns to support a regulatory compliance feature, modify an existing column’s data type from INT to BIGINT to accommodate larger transaction IDs, and add two composite indexes to support new reporting queries. The constraint was equally clear: zero downtime, no replication lag exceeding 5 seconds, and the migration had to complete within a single maintenance window.
The Schema Changes Required
-- Required changes on `transactions` table (500M+ rows)
ALTER TABLE transactions ADD COLUMN compliance_flag TINYINT DEFAULT 0, ADD COLUMN review_timestamp DATETIME NULL, ADD COLUMN audit_ref VARCHAR(64) NULL, MODIFY COLUMN txn_id BIGINT NOT NULL, ADD INDEX idx_compliance (compliance_flag, review_timestamp), ADD INDEX idx_audit (audit_ref, created_at);Running this as a single ALTER TABLE would have taken an estimated 6–8 hours based on our benchmarks on a staging replica, with full table lock for the MODIFY COLUMN operation. This was unacceptable.
Facing a similar challenge? Talk to our MySQL experts for a free migration assessment.
Before executing, Mafiree team has evaluated three industry-standard approaches for zero-downtime MySQL schema migration. Each has distinct strengths depending on your environment and the type of change required.
| Feature | gh-ost | pt-osc | MySQL 8.0 DDL |
| Mechanism | Binlog-based replication | Trigger-based shadow table | Native InnoDB engine |
| Trigger Dependency | No triggers needed | Requires 3 triggers | No triggers |
| Throttling | Built-in, replica-aware | Manual configuration | No throttling |
| Rollback | Pause/abort anytime | Drop shadow table | Limited (depends on op) |
| Foreign Key Support | Limited | Better support | Full native support |
| Best For | Large tables, complex ops | Legacy MySQL versions | Simple adds/drops |
After thorough evaluation, Mafiree Team adopted a phased approach, using different tools for different operations based on their risk profiles. This hybrid strategy minimized risk while maximizing efficiency.
Phase 1: Adding New Columns with MySQL 8.0 INSTANT DDL
MySQL 8.0 supports the INSTANT algorithm for adding columns at the end of a table. This modifies only metadata and completes in milliseconds, regardless of table size.
-- Phase 1: Instant column additions (< 1 second each)
ALTER TABLE transactions ADD COLUMN compliance_flag TINYINT DEFAULT 0, ALGORITHM=INSTANT; ALTER TABLE transactions ADD COLUMN review_timestamp DATETIME NULL, ALGORITHM=INSTANT; ALTER TABLE transactions ADD COLUMN audit_ref VARCHAR(64) NULL, ALGORITHM=INSTANT;Result: All three columns added in under 500 milliseconds total. Zero replication lag. Zero lock contention.
Expert Tip: INSTANT DDL in MySQL 8.0 only works for adding columns at the end of the table, adding or dropping virtual columns, and a few other metadata-only operations. Always verify with ALGORITHM=INSTANT explicitly; if the operation cannot be done instantly, MySQL will throw an error rather than silently falling back to a table-copy operation.
Phase 2: Column Type Change with gh-ost
The MODIFY COLUMN from INT to BIGINT required a full table rebuild. This was our highest-risk operation. We chose gh-ost for its binlog-based approach, which avoids triggers entirely and provides real-time throttling based on replica lag.
gh-ost \ --host=primary-mysql.internal \ --database=fintech_prod \ --table=transactions \ --alter="MODIFY COLUMN txn_id BIGINT NOT NULL" \ --chunk-size=2500 \ --max-lag-millis=4000 \ --throttle-control-replicas=replica1.internal,replica2.internal \ --critical-lag-millis=5000 \ --nice-ratio=0.5 \ --cut-over=default \ --exact-rowcount \ --concurrent-rowcount \ --execute
Key Configuration Decisions
chunk-size=2500: Mafiree team has started conservatively. On a table doing 3,000 QPS, smaller chunks keep the I/O overhead per cycle low and avoid overwhelming the buffer pool.
max-lag-millis=4000: gh-ost automatically pauses migration if any monitored replica exceeds 4 seconds of lag, keeping us well within our 5-second SLA.
nice-ratio=0.5: For every chunk copied, gh-ost waits an equal duration before the next chunk. This effectively halves the migration speed but ensures production queries are not starved.
cut-over=default: The final table swap uses an atomic RENAME TABLE operation with a brief metadata lock, typically completing in under 50 milliseconds.
Result: The migration ran for 4 hours and 22 minutes. Replica lag never exceeded 2.8 seconds. Application latency increased by only 3ms at the p99 level during peak migration throughput.
Phase 3: Index Creation with pt-online-schema-change
For the two new composite indexes, Mafiree team has used pt-online-schema-change. Index creation is generally safer than column modifications, and pt-osc’s trigger-based approach handles it reliably.
pt-online-schema-change \ --alter "ADD INDEX idx_compliance (compliance_flag, review_timestamp), \ ADD INDEX idx_audit (audit_ref, created_at)" \ --host=primary-mysql.internal \ D=fintech_prod,t=transactions \ --chunk-size=5000 \ --max-lag=4 \ --check-interval=2 \ --progress=time,30 \ --executeResult: Both indexes created in 3 hours 15 minutes with zero impact on application performance.
➤ Need help planning your MySQL schema migration? Request a free assessment
Based on hundreds of schema migrations across our client base, here are the production-tested best practices that prevent downtime and data loss.
1. Always Dry-Run on a Staging Replica First
Both gh-ost and pt-osc support dry-run modes. Never skip this step. A dry run on a replica with production-equivalent data reveals locking issues, unexpected I/O patterns, and accurate time estimates that pure theory cannot provide.
# gh-ost dry run (does everything except the final cut-over) gh-ost --host=staging-replica.internal \ --database=fintech_prod --table=transactions \ --alter="MODIFY COLUMN txn_id BIGINT NOT NULL" \ --test-on-replica --execute2. Monitor Active Sessions Before Cut-Over
The most common cause of cut-over failure is not the tool but a long-running query or open session holding a metadata lock on the target table. In the minutes before cut-over, actively check for blocking sessions.
-- Check for sessions holding locks on the target table
SELECT * FROM performance_schema.metadata_locks WHERE OBJECT_NAME = 'transactions' AND LOCK_STATUS = 'GRANTED';
-- Identify long-running queries
SELECT ID, TIME, STATE, INFO FROM information_schema.PROCESSLIST WHERE TIME > 30 AND DB = 'fintech_prod';3. Stage Schema Changes Ahead of Application Code
Decouple schema changes from application deployments. Apply schema changes first, then deploy code that uses the new columns or indexes. This ensures backward compatibility at every step: old application code works with the new schema, and new application code works with the new schema. If the code deployment fails, your schema change is still safe.
4. Plan for Rollback at Every Phase
Every migration phase should have a documented rollback plan. For gh-ost, you can abort mid-migration and the original table remains untouched. For pt-osc, dropping the shadow table reverts the process. For INSTANT DDL additions, a simple ALTER TABLE ... DROP COLUMN reverses the change. The key is knowing your rollback path before you start.
5. Monitor Replication Lag Continuously
Use dedicated monitoring that tracks replication lag at sub-second granularity throughout the migration window. Mafiree’s managed MySQL monitoring collects metrics every 5 minutes and provides real-time alerting on lag spikes, I/O saturation, and query latency anomalies during migration windows.
Use Native MySQL DDL When:
The operation is supported by ALGORITHM=INSTANT (adding columns at the end, dropping columns in MySQL 8.0.29+, renaming columns). The change is metadata-only and completes in milliseconds. You have verified the operation on a staging environment with ALGORITHM=INSTANT explicitly specified.
Use gh-ost When:
The table has more than 50 million rows and the operation requires a full table rebuild (column type changes, character set conversions). You need granular throttling based on replica lag. The table has existing triggers that conflict with pt-osc’s trigger-based approach.
Use pt-online-schema-change When:
You are running older MySQL versions (5.6, 5.7) where gh-ost’s binlog requirements may not be met. The table has foreign key relationships that need to be preserved during migration. You prefer a well-tested, widely-deployed tool with simpler configuration.
Zero-downtime MySQL schema migration is not about finding one magic tool. It is about understanding your specific change, choosing the right tool for each operation, and following a disciplined process of testing, monitoring, and staged execution. By splitting our client’s migration into three targeted phases, we turned what could have been an 8-hour outage into a seamless background operation.
The tools exist. The patterns are proven. What separates a smooth migration from a stressful incident is planning, expertise, and real-time monitoring. If you are scaling your MySQL environment and need professional guidance, Mafiree is here to help.
Ready to make your next MySQL schema migration stress-free? Request a free migration assessment today and discover how our team can help you evolve your database without downtime.
Miru IT Park, Vallankumaranvillai,
Nagercoil, Tamilnadu - 629 002.
Unit 303, Vanguard Rise,
5th Main, Konena Agrahara,
Old Airport Road, Bangalore - 560 017.
Call: +91 6383016411
Email: sales@mafiree.com