AZ-305 - Design a Site Recovery Strategy

1. Business Continuity Concepts

Business continuity planning ensures that critical business functions can continue during and after a disaster. Two fundamental metrics drive every recovery design: the Recovery Point Objective (RPO) and the Recovery Time Objective (RTO).

RPO (Recovery Point Objective)

RPO defines the maximum acceptable amount of data loss measured in time. An RPO of 1 hour means you can tolerate losing up to 1 hour of data. The lower the RPO, the more frequent the replication or backup must be, which increases cost and complexity.

RTO (Recovery Time Objective)

RTO defines the maximum acceptable downtime after a disruption. An RTO of 4 hours means the application must be restored and operational within 4 hours of an outage. Reducing RTO typically requires standby infrastructure and automated failover mechanisms.

Business Impact Analysis

Before selecting a recovery strategy, perform a business impact analysis (BIA) to classify workloads by criticality. Tier-1 workloads (mission-critical) require the lowest RPO/RTO and warrant hot standby configurations. Tier-2 workloads (business-important) may tolerate hours of RTO with warm standby. Tier-3 workloads (non-critical) can use cold standby with longer recovery windows.

2. Azure Site Recovery (ASR)

What is Azure Site Recovery?

Azure Site Recovery (ASR) is the native Azure disaster recovery service. It orchestrates replication, failover, and recovery of workloads to ensure business continuity. ASR can replicate Azure VMs between regions, on-premises VMs to Azure, and on-premises VMs to a secondary datacenter.

ASR Replication Architecture

For Azure-to-Azure replication, ASR uses a source region and a target region. The Mobility service agent on each VM captures disk writes and sends them to a cache storage account in the source region. From there, data is replicated to managed disks (replica disks) in the target region. Recovery points are generated from the replicated data at configurable intervals.

ASR Components

Recovery Services Vault: The management container for ASR configurations, replication policies, and recovery plans. The vault must be in the target region.
Replication Policy: Defines the recovery point retention period (default 24 hours), app-consistent snapshot frequency, and crash-consistent recovery point interval (every 5 minutes).
Recovery Plan: Groups machines into ordered steps for failover. You can add scripts or manual actions between groups to handle dependencies.
Mobility Service: An agent installed on each replicated VM that captures disk writes and facilitates replication.

Supported Workloads

Workload Type	Source	Target	Notes
Azure VMs	Azure Region A	Azure Region B	Native Azure-to-Azure replication
VMware VMs	On-premises	Azure	Requires configuration server and process server
Hyper-V VMs	On-premises	Azure	Supports with or without System Center VMM
Physical Servers	On-premises	Azure	Windows and Linux physical servers supported
AWS EC2 Instances	AWS	Azure	Treated as physical servers for migration

3. Failover Types

Test Failover

A test failover validates your replication and recovery plan without impacting production. VMs are created in an isolated Azure virtual network using a selected recovery point. Production replication continues uninterrupted during the test. After validation, you clean up the test environment. Microsoft recommends performing test failovers at least every 90 days.

Planned Failover

A planned failover is used for expected events such as scheduled maintenance or anticipated regional issues. The source VMs are shut down first to ensure zero data loss (RPO of zero). All pending data is replicated to the target before the failover completes. After the planned event, you perform a planned failback to the original region.

Planned Failover Key Point

Because the source is shut down before failover begins, planned failover guarantees zero data loss. This is the only failover type that achieves an RPO of zero.

Unplanned Failover (Forced Failover)

An unplanned failover is triggered when the source region experiences an unexpected outage. Since the source is unavailable, pending replication data may be lost (data loss up to the RPO). You select a recovery point (latest, latest app-consistent, or a specific point in time) and failover proceeds using the replicated data in the target region.

Failover Type	When Used	Data Loss	Production Impact
Test Failover	DR drill / validation	None (isolated network)	No impact
Planned Failover	Scheduled maintenance	Zero (source shut down first)	Temporary downtime
Unplanned Failover	Unexpected outage	Up to RPO	Failover to target region

4. Azure Geographies and Paired Regions

Paired Regions

Azure organizes regions into pairs within the same geography. Paired regions provide built-in advantages for disaster recovery: updates are rolled out sequentially (never to both regions simultaneously), and in the event of a broad outage, one region from each pair is prioritized for recovery.

Paired Region Examples

East US is paired with West US. North Europe is paired with West Europe. Southeast Asia is paired with East Asia. When designing a site recovery strategy, using paired regions is the recommended approach for Azure VM replication with ASR.

Cross-Region Replication Benefits

Data residency compliance: paired regions are in the same geography, satisfying data sovereignty requirements.
Sequential updates: Azure never updates both regions in a pair at the same time, reducing the risk of simultaneous outages.
Priority recovery: in a multi-region outage, one region from each pair is given recovery priority.
Physical isolation: Azure ensures a minimum distance of 300 miles between paired regions where possible.

5. Recovery Plans and Automation

Recovery Plans

Recovery plans in ASR define the order of failover for groups of VMs. Each group fails over in sequence, allowing you to control startup order for multi-tier applications. For example, Group 1 could contain databases, Group 2 application servers, and Group 3 web frontends.

Automation with Runbooks

Azure Automation runbooks can be attached to recovery plan steps to automate tasks during failover. Common automation tasks include updating DNS records, reconfiguring load balancers, adding public IP addresses, and applying network security group rules to the target environment.

Re-Protection and Failback

After failover to the target region, you must re-protect the VMs to reverse the replication direction. Once re-protection is complete and the original region is healthy, you can perform a planned failback to return to the primary region with zero data loss.

Key Terms

Term	Definition
RPO (Recovery Point Objective)	Maximum acceptable data loss measured in time. Determines replication frequency.
RTO (Recovery Time Objective)	Maximum acceptable downtime after a disruption. Determines the speed of recovery.
Azure Site Recovery (ASR)	Azure native disaster recovery service that orchestrates replication, failover, and recovery of VMs and physical servers.
Recovery Services Vault	Management container for ASR configurations, policies, and recovery plans. Must be in the target region.
Recovery Plan	Ordered group of machines that fail over together with optional scripts and manual actions between groups.
Paired Regions	Two Azure regions within the same geography that provide built-in advantages for disaster recovery including sequential updates and priority recovery.
Re-Protection	The process of reversing ASR replication direction after failover so that failback to the original region becomes possible.
Crash-Consistent Recovery Point	A recovery point capturing disk state as if the machine crashed. Created every 5 minutes by default in ASR.

Exam Tips

Planned failover is the only type that guarantees zero data loss (RPO of zero) because the source is shut down first and all pending data is replicated.
Test failover uses an isolated network and does not affect production replication. Microsoft recommends testing at least every 90 days.
ASR crash-consistent recovery points are created every 5 minutes by default. App-consistent snapshots are created at a configurable interval (default every 1 hour).
The Recovery Services vault must be located in the target region, not the source region.
For on-premises VMware to Azure replication, a configuration server and process server are required on-premises. For Hyper-V, only a Hyper-V host or VMM server is needed.
Paired regions satisfy data residency requirements because both regions are within the same geography.

Practice Questions

Question 1

Your company requires that no more than 15 minutes of data can be lost in a disaster and the application must be online within 2 hours. Which values correctly describe the RPO and RTO?

A. RPO = 2 hours, RTO = 15 minutes

B. RPO = 15 minutes, RTO = 2 hours

C. RPO = 0, RTO = 15 minutes

D. RPO = 15 minutes, RTO = 0

Answer: B

RPO defines maximum data loss (15 minutes) and RTO defines maximum downtime (2 hours). RPO and RTO are independent metrics. Zero RPO requires planned failover or synchronous replication.

Question 2

You need to validate your Azure Site Recovery configuration without affecting production workloads. Which operation should you perform?

A. Planned failover

B. Unplanned failover

C. Test failover

D. Forced failover

Answer: C

Test failover creates VMs in an isolated virtual network and does not impact production replication. Planned and unplanned failovers both affect the production environment.

Question 3

You are designing a disaster recovery strategy for Azure VMs running in East US. You need to satisfy data residency requirements within the United States. Which target region should you choose for ASR replication?

A. North Europe

B. West US

C. Canada Central

D. UK South

Answer: B

West US is the paired region for East US and is within the same geography (United States), satisfying data residency requirements. Other options are in different countries and geographies.

Question 4

During a planned maintenance window, you need to fail over your VMs with zero data loss. What must happen before the failover begins?

A. The target VMs must be pre-provisioned

B. The source VMs must be shut down and all pending data replicated

C. DNS records must be updated to point to the target region

D. The Recovery Services vault must be moved to the source region

Answer: B

Planned failover achieves zero data loss by shutting down the source VMs first and ensuring all pending replication data is transmitted to the target before failover completes.

Question 5

After failing over to the target region with ASR, you need to prepare to return workloads to the original region once it recovers. What must you do first?

A. Delete the original VMs

B. Create a new Recovery Services vault in the source region

C. Re-protect the VMs to reverse the replication direction

D. Disable replication and re-enable it manually

Answer: C

Re-protection reverses the ASR replication direction from the current (target) region back to the original (source) region. Once re-protection completes and the source region is healthy, you can perform a planned failback.

AZ-305 Designing Azure Infrastructure Solutions - Table of Contents

Master all exam topics with comprehensive study guides and practice questions.

Design a Solution for Logging and Monitoring Azure Monitor, App Insights, Log Analytics, Sentinel Design Authentication and Authorization Solutions RBAC, Identity Management Design Authentication Azure AD, MFA, Password Reset, AD Sync Design Authorization JIT Access, Azure Resource Graph, AD Groups Design Governance Azure Policy, Template Specs, Deployment Stacks Design a Data Management Strategy DTUs, RUs, Relational/NoSQL, Data Retention Design a Data Protection Strategy Geo-Replication, Encryption, Scaling, DLP Design a Monitoring Strategy for the Data Platform Data Monitoring Strategy Design a Site Recovery Strategy ASR, Failover, Paired Regions, BCDR Design for High Availability Redundancy, Zone-Redundant, HA Storage/SQL Design a Data Archiving Strategy Access Tiers, Archiving, SLAs Design Deployments Compute, Container, Database, Web App Deployments Design Migrations Cloud Adoption Framework, IaaS/PaaS Migration Design an API Integration Strategy API Management, API Policies Design a Storage Strategy Access Tiers, Requirements, Storage Management Design a Compute Strategy Compute Options, HPC, Windows Virtual Desktop Design a Networking Strategy Hub-Spoke, vWAN, DNS, Private Endpoints, Load Balancing

Search Tutorials

AZ-305 - Design a Site Recovery Strategy

1. Business Continuity Concepts

RPO (Recovery Point Objective)

RTO (Recovery Time Objective)

Business Impact Analysis

2. Azure Site Recovery (ASR)

What is Azure Site Recovery?

ASR Replication Architecture

ASR Components

Supported Workloads

3. Failover Types

Test Failover

Planned Failover

Planned Failover Key Point

Unplanned Failover (Forced Failover)

4. Azure Geographies and Paired Regions

Paired Regions

Paired Region Examples

Cross-Region Replication Benefits

5. Recovery Plans and Automation

Recovery Plans

Automation with Runbooks

Re-Protection and Failback

Key Terms

Exam Tips

Practice Questions

Question 1

Question 2

Question 3

Question 4

Question 5

AZ-305 Designing Azure Infrastructure Solutions - Table of Contents

Popular Posts