This post was originally published on this site

Most businesses with mission-critical workloads have a two-fold disaster recovery solution in place that 1) replicates data to a secondary location, and 2) enables failover to that location in the event of an outage. For BigQuery, that solution takes the shape of BigQuery Managed Disaster Recovery. But the risk of data loss while testing a disaster recovery event remains a primary concern. Like traditional “hard failover” solutions, it forces a difficult choice: promote the secondary immediately and risk losing any data within the Recovery Point Objective (RPO), or delay recovery while you wait for a primary region that may never come back online.

Today, we’re addressing this directly with the introduction of soft failover in BigQuery Managed Disaster Recovery. Soft failover logic promotes the secondary region’s compute and datasets only after replication has been confirmed to be complete, providing you with full control over disaster recovery transitions, and minimizing the risk of data loss during a planned failover.

1

Figure 1: Comparing hard vs. soft failover

Summary of differences between hard failover and soft failover

 

Hard failover

Soft failover

Use case

Unplanned outages, region down

Failover testing, requires primary and secondary to both be available 

Failover timing

As soon as possible ignoring any pending replication between primary and secondary; data loss possible

Subject to primary and secondary acquiescing,  minimizing potential for data loss

RPO/RTO

15 minutes / 5 minutes*

N/A

*Supported objective depending on configuration

BigQuery soft failover in action 

Imagine a large financial services company, “SecureBank,” which uses BigQuery for its mission-critical analytics and reporting. SecureBank requires a reliable Recovery Time Objective (RTO) and15 minute Recovery Point Objective (RPO) for its primary BigQuery datasets, as robust disaster recovery is a top priority. They regularly conduct DR drills with BigQuery Managed DR to ensure compliance and readiness for unforeseen outages.

Before the introduction of soft failover in BigQuery Managed DR BigQuery, SecureBank faced a dilemma on how to perform their DR drills. While BigQuery Managed DR handled the failover of compute and associated datasets, conducting a full “hard failover” drill meant accepting the risk of up to 15 minutes of data loss if replication wasn’t complete when the failover was  initiated — or significant operational disruption if they first manually verified data synchronization across regions. This often led to less realistic or more complex drills, consuming valuable engineering time and causing anxiety.

New solution: 

With soft failover in BigQuery Managed DR, administrators have several options for failover procedures. Unlike hard failover for unplanned outages, soft failover initiates failover only after all data is replicated to the secondary region, to help guarantee data integrity.

2

Figure 2: Soft Failover Mode Selection

3

Figure 3: Disaster recovery reservations

4

Figure 4: Replication status / Failover details

BigQuery soft failover feature is available today via the BigQuery UI, DDL, and CLI, providing enterprise-grade control for disaster recovery, confident simulations, and compliance — without risking data loss during testing. Get started today to maintain uptime, prevent data loss, and test scenarios safely.