Search Tutorials


AZ-305 - Design for High Availability | JavaInUse

AZ-305 - Design for High Availability

1. Application Redundancy Patterns

High availability (HA) ensures that applications remain operational during infrastructure failures. The core principle is redundancy: eliminating single points of failure by deploying multiple instances of every component. Azure provides several mechanisms to achieve redundancy at different levels.

Active-Active Pattern

Active-Active Deployments

In an active-active pattern, multiple instances serve traffic simultaneously. A load balancer distributes requests across all instances. If one instance fails, the remaining instances absorb the traffic. This pattern provides the lowest RTO because there is no failover delay. It is recommended for mission-critical applications.

Active-Passive Pattern

Active-Passive Deployments

In an active-passive pattern, one instance (primary) handles all traffic while a standby instance (secondary) waits idle. If the primary fails, the passive instance takes over. This pattern has a higher RTO than active-active because failover is required, but it is more cost-effective since the standby can use a smaller SKU or be allocated fewer resources.

Multi-Region Deployments

For maximum availability, deploy your application in two or more Azure regions. Use Azure Front Door or Azure Traffic Manager to route traffic between regions. Multi-region architectures protect against entire region outages and can provide geographic proximity for lower latency.

2. Essential HA Components

Availability Sets

An Availability Set distributes VMs across fault domains and update domains within a single Azure datacenter. Fault domains are groups of VMs that share a common power source and network switch. Update domains are groups that can be rebooted simultaneously during planned maintenance.

  • Up to 3 fault domains per availability set
  • Up to 20 update domains per availability set
  • Provides 99.95% SLA for VMs in an availability set
  • Protects against hardware failures and planned maintenance within a single datacenter

Availability Zones

Availability Zones are physically separate locations within an Azure region. Each zone has independent power, cooling, and networking. Deploying VMs across availability zones protects against datacenter-level failures.

Availability Zones SLA

VMs deployed across two or more Availability Zones receive a 99.99% SLA, which is higher than the 99.95% SLA for Availability Sets. Zones provide the highest level of availability within a single region.

Virtual Machine Scale Sets (VMSS)

VMSS allows you to deploy and manage a group of identical VMs that automatically scale based on demand. VMSS supports deployment across Availability Zones (zone-redundant VMSS) and integrates with Azure Load Balancer or Application Gateway for traffic distribution. VMSS provides both high availability and auto-scaling in a single service.

HA ComponentScopeSLAProtects Against
Availability SetSingle datacenter99.95%Hardware failure, planned maintenance
Availability ZonesMultiple datacenters in a region99.99%Datacenter-level failure
Multi-RegionMultiple regionsComposite (near 100%)Region-level failure
VMSS (zone-redundant)Multiple zones99.99%Datacenter failure + auto-scaling

3. Storage Redundancy for High Availability

Storage Redundancy Types

Azure Storage provides multiple redundancy options that determine how and where your data is replicated. The choice depends on durability requirements, read-access needs, and cost constraints.

Local and Zone Redundancy

LRS (Locally Redundant Storage) replicates data three times within a single datacenter. It provides 11 nines of durability but does not protect against datacenter or regional failures. ZRS (Zone-Redundant Storage) replicates data across three availability zones in a region, providing protection against datacenter-level failures with 12 nines of durability.

Geo-Redundancy Options

GRS replicates data to a paired region (six copies total: three in primary, three in secondary). RA-GRS adds read access to the secondary. GZRS combines zone-redundancy in the primary with geo-replication to the secondary. RA-GZRS provides zone-redundancy in the primary plus read access to the geo-replicated secondary. RA-GZRS offers the highest storage availability.

Redundancy TypeCopiesScopeRead Access (Secondary)Durability (nines)
LRS3Single datacenterN/A11
ZRS3Three availability zonesN/A12
GRS6Two regionsNo (until failover)16
RA-GRS6Two regionsYes16
GZRS6Three zones + secondary regionNo (until failover)16
RA-GZRS6Three zones + secondary regionYes16

4. HA for Non-Relational Storage

Cosmos DB Multi-Region Writes

Azure Cosmos DB achieves high availability through multi-region replication. You can add any number of Azure regions to your Cosmos DB account with turnkey global distribution. With multi-region writes enabled, the application can write to any region, providing both high availability and low-latency writes for globally distributed users.

Cosmos DB SLAs

A single-region Cosmos DB account provides 99.99% availability. A multi-region account with a single write region provides 99.99% for writes and 99.999% for reads. A multi-region account with multi-region writes provides 99.999% availability for both reads and writes.

Cosmos DB Automatic Failover

When multiple regions are configured, Cosmos DB can perform automatic failover if the write region becomes unavailable. Failover priority is configurable, and the service promotes the next region in the priority list to be the new write region. Client applications using the Azure Cosmos DB SDK are automatically redirected to the new write region.

5. HA for Relational SQL

Auto-Failover Groups

Azure SQL auto-failover groups replicate databases to a secondary region and provide automatic failover with a single read-write listener endpoint. The listener endpoint remains constant before and after failover, so applications do not need connection string changes. Failover can be triggered automatically (based on a detected outage) or manually.

Auto-Failover Group Architecture

An auto-failover group contains a primary server in one region and a partner server in another region. Databases are replicated using asynchronous replication. The group exposes two endpoints: a read-write listener (points to the primary) and a read-only listener (points to the secondary). After failover, the read-write listener automatically redirects to the new primary.

Active Geo-Replication

Active geo-replication provides up to four readable secondary replicas in different regions. Unlike auto-failover groups, it does not provide automatic failover or a single listener endpoint. You must manually initiate failover and update connection strings. Use active geo-replication when you need more than one secondary or finer control over failover.

Zone-Redundant Configuration

Azure SQL Database supports zone-redundant configuration for the Business Critical and Premium tiers. Database replicas are distributed across availability zones within a region, providing protection against datacenter failures. This is an intra-region HA feature and should be combined with auto-failover groups for cross-region protection.

Key Terms

TermDefinition
Availability SetDistributes VMs across fault domains and update domains within a single datacenter for 99.95% SLA.
Availability ZonePhysically separate datacenter within an Azure region with independent power, cooling, and networking. Provides 99.99% SLA.
VMSS (Virtual Machine Scale Sets)Service for deploying and managing groups of identical VMs with auto-scaling and zone-redundancy support.
LRS (Locally Redundant Storage)Storage redundancy with three copies in a single datacenter. Lowest cost but no protection against datacenter failure.
ZRS (Zone-Redundant Storage)Storage redundancy across three availability zones within a single region.
RA-GZRSHighest availability storage option: zone-redundant in primary region plus geo-replication with read access to secondary region.
Auto-Failover GroupAzure SQL feature providing automatic cross-region failover with a constant read-write listener endpoint.
Multi-Region WritesCosmos DB feature allowing write operations in any configured region for 99.999% write and read availability.

Exam Tips

  • Availability Zones provide a higher SLA (99.99%) than Availability Sets (99.95%). Choose Availability Zones when available for your region and VM size.
  • RA-GZRS provides the highest storage availability: zone-redundant in the primary region plus read access to the geo-replicated secondary. It is the correct answer when maximum storage resilience is required.
  • Cosmos DB with multi-region writes provides 99.999% availability for both reads and writes. This is the highest availability option for NoSQL workloads.
  • Auto-failover groups use a constant listener endpoint; active geo-replication does not. If the exam asks about seamless failover without connection string changes, auto-failover groups are the answer.
  • Azure SQL zone-redundant configuration is available only for Business Critical and Premium tiers. It protects against datacenter failures within a region but not against region-level outages.
  • VMSS supports both auto-scaling and zone-redundant deployment. It combines HA and scalability in a single service.

Practice Questions

Question 1

You need to deploy virtual machines in Azure with the highest possible availability SLA within a single region. What should you use?

A. Availability Set with 3 fault domains

B. Availability Zones across 2 or more zones

C. A single VM with Premium SSD

D. Virtual Machine Scale Set in a single zone

Answer: B

Availability Zones provide a 99.99% SLA by distributing VMs across physically separate datacenters within a region. Availability Sets provide only 99.95%. A single VM with Premium SSD provides 99.9%.

Question 2

Your organization requires that storage data survives a complete datacenter failure within a region but does not need geo-replication. Which redundancy type should you select?

A. LRS (Locally Redundant Storage)

B. ZRS (Zone-Redundant Storage)

C. GRS (Geo-Redundant Storage)

D. RA-GRS (Read-Access Geo-Redundant Storage)

Answer: B

ZRS replicates data across three availability zones within a single region, surviving datacenter-level failures. LRS only replicates within a single datacenter. GRS and RA-GRS add geo-replication which is not required here and would add unnecessary cost.

Question 3

You need the highest possible read and write availability for your Cosmos DB application. Which configuration should you use?

A. Single-region Cosmos DB account

B. Multi-region account with a single write region

C. Multi-region account with multi-region writes enabled

D. Single-region account with the Strong consistency level

Answer: C

Multi-region writes provide 99.999% availability for both reads and writes. A single write region provides 99.99% for writes and 99.999% for reads. Single-region provides only 99.99% for both.

Question 4

Your Azure SQL Database must fail over to another region automatically and applications must not need to change their connection strings. Which feature should you configure?

A. Active geo-replication

B. Auto-failover groups

C. Azure Site Recovery

D. Zone-redundant configuration

Answer: B

Auto-failover groups provide automatic failover with a constant read-write listener endpoint. Active geo-replication requires manual failover and connection string updates. Zone-redundant configuration only protects within a single region.

Question 5

You need the maximum possible durability and availability for an Azure Storage account that stores critical backup data. Reads from the secondary region must be available without performing a failover. Which redundancy option should you choose?

A. GRS

B. ZRS

C. RA-GRS

D. RA-GZRS

Answer: D

RA-GZRS provides zone-redundancy in the primary region (protecting against datacenter failure), geo-replication to a secondary region (protecting against regional failure), and read access to the secondary. It offers the highest combined availability and durability of all storage redundancy options.

AZ-305 Designing Azure Infrastructure Solutions - Table of Contents

Master all exam topics with comprehensive study guides and practice questions.


Popular Posts