Cloud migration cutover planning - zero downtime approach

The migration itself is rarely the hard part. The hard part is the cutover - the point at which production traffic shifts from the old environment to the new one. Done poorly, it creates hours of downtime, data inconsistencies, and incidents that erode organizational confidence in the migration program. Done well, it is a non-event.

Quantus IT has executed cloud migrations for organizations across financial services, manufacturing, and distribution, including a global order ingestion system that handled millions of transactions without a production outage during migration. The methodology described here is the one we apply on those engagements.

Why "We'll Figure Out Cutover Later" Fails

The most common cutover failure mode is treating it as a final step rather than a constraint that shapes the entire migration design. When cutover planning begins late, teams discover that the architecture they built does not support the parallel-run period they need, or that rollback would require manual data reconciliation that was not accounted for. Downtime that could have been avoided becomes a negotiated compromise.

Cutover strategy should be defined during the migration design phase, before any workloads move. The chosen cutover pattern determines the required infrastructure, the data synchronization approach, the required testing duration, and the rollback procedure - all of which affect migration cost and timeline.

The Four Cutover Patterns

Enterprise migrations typically use one of four cutover patterns, selected based on workload characteristics and downtime tolerance:

  • Parallel run with traffic split: Both old and new environments run simultaneously; traffic is gradually shifted (10%, 25%, 50%, 100%) using a load balancer or traffic manager. Ideal for stateless services. Requires the application to function correctly with split traffic.
  • Blue-green deployment: The new environment (green) is fully built and validated before traffic is switched from the old (blue). Traffic switch takes minutes; rollback is immediate. Requires double the infrastructure cost during the cutover window but eliminates gradual-transition complexity.
  • Canary release: A small percentage of users (typically 1-5%) are routed to the new environment for a defined period. After validation, the percentage increases. Appropriate for high-traffic systems where gradual validation is more important than speed.
  • Scheduled maintenance window: For systems that cannot run parallel (typically due to data model changes or stateful dependencies), a planned downtime window with a pre-communicated rollback timer. The window should be short enough that the rollback procedure can complete within the announced SLA.

Data Synchronization During Parallel Run

The cutover window is manageable when data is stateless or read-only. When both environments need to write to their own databases during a parallel run, data synchronization becomes the primary technical challenge. Options in order of complexity:

  • Single shared database: Both environments write to the old database during parallel run; the new environment transitions to its own database only at full cutover. Simplest approach, but the old database must support the new schema.
  • Change data capture (CDC): A replication layer streams changes from the old database to the new in near-real time. Azure Data Factory, Debezium, or Azure SQL Data Sync can implement this. Enables a short cutover window because data lag is minimal.
  • Event replay: For event-driven architectures, the new environment can replay events from a message queue or event store to catch up to the current state before traffic switches. This is the pattern Quantus IT used for the global order ingestion migration that handled zero-downtime cutover across 5x more endpoints than the original system.

The Rollback Plan

A zero-downtime migration without a tested rollback plan is not zero-downtime - it is optimism. The rollback plan must satisfy three requirements:

  • Speed: Rollback must complete within the same downtime tolerance as the original cutover. If you are committing to zero downtime, rollback must also be zero-downtime or near-zero.
  • Data integrity: Define the behavior for transactions that were processed in the new environment before rollback. Will they be lost, replayed, or manually reconciled? The answer depends on the business tolerance for each scenario.
  • Tested: Rollback procedures that have not been executed in a pre-production environment will fail in production. Test the rollback at least once before the production cutover window.

The Cutover Runbook

Every zero-downtime migration requires a step-by-step cutover runbook with a clear owner for each step, defined success criteria, and explicit go/no-go decision points. The runbook should include:

  • Pre-cutover validation checklist (health checks, data reconciliation counts, performance benchmarks)
  • Traffic shift steps with the exact commands or UI actions required
  • Observation period with specific metrics to monitor and thresholds for rollback trigger
  • Rollback procedure with the same level of step-by-step detail as the forward path
  • Communication plan for each phase (who is notified when traffic shifts, when validation passes, when cutover is complete)

For organizations planning an upcoming migration, Quantus IT's Cloud Migrations practice delivers the full migration design, architecture, and cutover planning as an integrated engagement. Contact us to discuss your timeline and requirements.

← Back to Insights

Planning a Cloud Migration?

Quantus IT designs and executes cloud migrations with zero-downtime cutover as a first-class requirement - not an afterthought. From strategy through go-live and optimization.

Start the Conversation