Zero-Downtime Database Migrations: A Field Guide
Deploying a breaking schema change to a live database is one of the most stressful events for any backend team. Traditional maintenance windows are no longer acceptable for high-availability services.
The Expand and Contract Pattern
The secret to zero-downtime migrations lies in the Expand and Contract pattern. Instead of modifying a column in place, the migration happens in several non-breaking phases.
Phase 1: Expand
Add the new schema changes (e.g., a new column) alongside the old one. The application code is updated to write to both the old and new columns, but still reads from the old column.
Phase 2: Backfill
Run a background script to migrate existing data from the old schema structure to the new one. This can happen slowly over hours or days without impacting production traffic.
Phase 3: Transition Read
Update the application code to read from the new column.
Phase 4: Contract
Remove the old application code that writes to the old column. Finally, drop the old column from the database.
By utilizing this multi-step approach and robust CI/CD pipelines, we have successfully executed dozens of high-stakes database migrations on mission-critical services without dropping a single user request.