AZ-305 - Design a Data Management Strategy

1. Managed vs Unmanaged Data

When designing a data management strategy, the first decision is whether to use managed (PaaS) or unmanaged (IaaS) data services. This choice impacts operational overhead, scalability, security responsibilities, and cost.

Managed Data Services (PaaS)

PaaS Database Services

Managed data services like Azure SQL Database, Azure Cosmos DB, and Azure Database for PostgreSQL/MySQL handle infrastructure management including patching, backups, high availability, and scaling. You focus on schema design, queries, and application logic. The cloud provider manages the underlying VMs, storage, and networking.

Unmanaged Data (IaaS)

IaaS Database Hosting

Running databases on Azure VMs (e.g., SQL Server on Azure VM) gives you full control over the database engine, OS configuration, and instance settings. This is ideal when you need specific database engine features, custom configurations, or must run unsupported database versions. However, you are responsible for patching, backups, HA configuration, and disaster recovery.

Aspect	Managed (PaaS)	Unmanaged (IaaS)
Infrastructure Management	Handled by Azure	Your responsibility
Patching and Updates	Automatic	Manual
Backups	Built-in automated	Must configure manually
High Availability	Built-in SLA	Must configure (Always On, clustering)
Customization	Limited to service features	Full OS and engine control
Cost Model	Pay for service tier/usage	Pay for VM, storage, licensing

2. Relational vs NoSQL Database Strategy

Choosing between relational and NoSQL databases depends on your data structure, query patterns, scalability requirements, and consistency needs. Azure offers strong options in both categories.

Relational Databases in Azure

Azure SQL Database

Azure SQL Database is a fully managed PaaS relational database engine based on the latest stable SQL Server engine. Key features: automatic tuning, intelligent performance insights, built-in HA with 99.99% SLA, geo-replication for disaster recovery, elastic pools for multi-tenant scenarios, and serverless compute tier for intermittent workloads.

Azure SQL Managed Instance

Azure SQL Managed Instance provides near-100% compatibility with the on-premises SQL Server engine. It supports features that Azure SQL Database does not, including cross-database queries, SQL Server Agent, CLR integration, Service Broker, and Database Mail. It is ideal for lift-and-shift migration of on-premises SQL Server databases with minimal application changes.

NoSQL Databases in Azure

Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model NoSQL database. It supports multiple APIs: NoSQL (document), MongoDB, Cassandra, Gremlin (graph), and Table. Key differentiators: single-digit millisecond latency at the 99th percentile, five consistency levels (strong, bounded staleness, session, consistent prefix, eventual), automatic and instant global distribution, and guaranteed 99.999% availability for multi-region accounts.

Feature	Relational (Azure SQL)	NoSQL (Cosmos DB)
Data Model	Structured, schema-enforced tables	Flexible schema: documents, key-value, graph, column-family
Query Language	T-SQL	SQL-like API, MongoDB queries, Gremlin, CQL
Scaling	Vertical (scale up) + read replicas	Horizontal (partition-based, global distribution)
Consistency	Strong (ACID transactions)	Tunable (5 consistency levels)
Best For	Structured data, complex joins, transactions	High-scale, low-latency, globally distributed applications

3. Database Auditing and the DTU/RU Concepts

Database Auditing

Azure SQL Database Auditing

Auditing tracks database events and writes them to an audit log in Azure Storage, Log Analytics, or Event Hubs. It records operations such as data queries (SELECT), data changes (INSERT, UPDATE, DELETE), schema changes, authentication events, and permission changes. Auditing can be configured at the server level (applies to all databases) or at the individual database level.

DTU (Database Transaction Unit)

Understanding DTUs

A DTU is a blended measure of CPU, memory, and I/O resources. Azure SQL Database offers DTU-based purchasing with predefined bundles: Basic (5 DTUs, light workloads), Standard (10-3000 DTUs, moderate workloads), Premium (125-4000 DTUs, high-performance workloads). DTUs simplify pricing but do not allow independent scaling of CPU, memory, or storage. For independent resource scaling, use the vCore-based model.

DTU vs vCore Purchasing Models

Aspect	DTU Model	vCore Model
Resources	Bundled (CPU + memory + I/O)	Independent (choose vCores, memory, storage)
Pricing	Simpler, fixed tiers	More flexible, can use Azure Hybrid Benefit
Use Case	Simple workloads, easy to understand	Predictable workloads needing specific resource configurations
Reserved Capacity	Not available	Available (1 or 3-year reservations)
Serverless	Not available	Available (auto-pause, per-second billing)

RU/s (Request Units per Second)

Understanding Request Units

In Azure Cosmos DB, throughput is measured in Request Units per second (RU/s). A single RU represents the cost of reading a 1 KB item by its ID and partition key. Write operations cost approximately 5-10 RU/s per 1 KB item. Complex queries cost more RU/s based on the number of items scanned and the complexity of operations. RU/s can be provisioned at the database or container level.

Throughput Modes in Cosmos DB

Provisioned throughput (manual) - You set a specific RU/s value; best for steady workloads. Provisioned throughput (autoscale) - Scales between 10% and 100% of the maximum RU/s you set; best for variable workloads. Serverless - Pay per RU consumed with no minimum; best for low-traffic or development workloads.

4. Data Retention and Availability Strategy

Data retention strategy defines how long data is stored and in what tier. Data availability, consistency, and durability address how data is protected and accessed across regions and failure scenarios.

Data Retention

Retention Tiers and Strategies

Hot data - Frequently accessed; stored in primary database or hot storage tier. Cool data - Infrequently accessed; moved to cool storage or archived tables. Archive data - Rarely accessed; stored in archive storage for compliance at the lowest cost. Implement retention policies with automated data lifecycle management rules to move data between tiers based on age, access patterns, or compliance requirements.

Data Availability and Durability

Strategy	Azure SQL	Cosmos DB	Azure Storage
Local Redundancy	Within datacenter replicas	4 replicas within region	LRS: 3 copies in one datacenter
Zone Redundancy	Zone-redundant configuration	Availability zone support	ZRS: 3 copies across 3 zones
Geo-Redundancy	Active geo-replication, failover groups	Multi-region writes	GRS/GZRS: copies to paired region
SLA	Up to 99.995% (with zone redundancy)	99.999% (multi-region)	Up to 99.99% (RA-GZRS)

Consistency Models

Cosmos DB Five Consistency Levels

From strongest to weakest: Strong - reads always return the most recent committed write (highest latency, lowest throughput). Bounded Staleness - reads lag behind writes by at most K versions or T time. Session - consistency within a client session; default and most popular. Consistent Prefix - reads never see out-of-order writes. Eventual - no ordering guarantee; lowest latency, highest throughput.

5. Data Warehouse Strategy

A data warehouse consolidates data from multiple sources for analytical reporting and business intelligence. Azure provides purpose-built services for enterprise-scale analytics.

Azure Synapse Analytics

Azure Synapse Analytics is an enterprise analytics service that brings together big data and data warehousing. It provides: Dedicated SQL pools - provisioned data warehouse with MPP (massively parallel processing) for complex analytical queries. Serverless SQL pools - query data in place in Azure Data Lake without loading it. Spark pools - Apache Spark for big data processing and machine learning. Synapse Pipelines - data integration and ETL/ELT orchestration.

Data Warehouse Design Patterns

ETL vs ELT

ETL (Extract, Transform, Load) - Data is extracted from source systems, transformed in a staging area (e.g., Azure Data Factory), and then loaded into the data warehouse. Best when transformations are complex and source data volumes are moderate. ELT (Extract, Load, Transform) - Data is loaded into the data warehouse first, then transformed using the warehouse engine (Synapse SQL). Best for large data volumes where the MPP engine can handle transformation efficiently.

Data Lake vs Data Warehouse

Aspect	Data Lake	Data Warehouse
Data Type	Raw, semi-structured, unstructured	Structured, curated, schema-on-write
Schema	Schema-on-read	Schema-on-write (predefined schema)
Processing	Batch and streaming	Batch (optimized for complex queries)
Users	Data engineers, data scientists	Business analysts, reporting tools
Azure Service	Azure Data Lake Storage Gen2	Azure Synapse Analytics (Dedicated SQL Pool)

Modern Data Warehouse Architecture

A modern data warehouse on Azure typically combines Azure Data Lake Storage Gen2 for raw data storage, Azure Data Factory or Synapse Pipelines for data ingestion and orchestration, Azure Synapse Analytics for data warehousing and analytics, and Power BI for reporting and visualization. This architecture supports both structured BI reporting and exploratory big data analytics in a unified platform.

Key Terms

Term	Definition
DTU	Database Transaction Unit - a blended measure of CPU, memory, and I/O for Azure SQL Database pricing.
RU/s	Request Units per second - throughput metric for Azure Cosmos DB representing the cost of database operations.
vCore	Virtual core purchasing model for Azure SQL that allows independent scaling of compute, memory, and storage.
Elastic Pool	Shared DTU or vCore resources across multiple Azure SQL databases for cost-effective multi-tenancy.
Cosmos DB	Globally distributed, multi-model NoSQL database with tunable consistency and single-digit millisecond latency.
Azure Synapse Analytics	Enterprise analytics service combining data warehousing, big data processing, and data integration.
Data Lake	Storage repository for raw data in native format; schema-on-read approach for flexible analytics.
Geo-replication	Asynchronous replication of data to a secondary region for disaster recovery and read scaling.

Exam Tips

Know when to use PaaS vs IaaS: Use Azure SQL Database (PaaS) for new applications. Use SQL Server on Azure VM (IaaS) when you need full engine control or features not available in PaaS. Use SQL Managed Instance for lift-and-shift migrations.
Understand DTU vs vCore: DTU bundles resources in fixed tiers (simpler). vCore allows independent resource scaling and supports Azure Hybrid Benefit and reserved capacity (more flexible and cost-effective for predictable workloads).
Cosmos DB consistency levels: Strong provides reads of the latest write but with higher latency. Session is the default and most commonly used. Eventual provides the lowest latency. Know the tradeoffs for exam scenarios.
RU/s planning for Cosmos DB: A point read of a 1 KB item costs 1 RU. Writes cost 5-10x more. Complex queries cost more based on items scanned. Use autoscale for variable workloads, serverless for low/dev traffic.
Data warehouse vs data lake: Data warehouse uses schema-on-write for structured BI queries. Data lake uses schema-on-read for flexible, large-scale analytics on raw data. Modern architectures combine both.
Auditing supports multiple destinations: Azure SQL auditing can write to Azure Storage, Log Analytics, and Event Hubs. Server-level auditing applies to all databases; database-level auditing applies to specific databases.

Practice Questions

Question 1

Your application needs a globally distributed database with single-digit millisecond reads and automatic multi-region failover. The data model is document-based with flexible schemas. Which service should you choose?

A. Azure SQL Database with geo-replication
B. Azure Cosmos DB with multi-region writes
C. Azure SQL Managed Instance with failover groups
D. SQL Server on Azure VM with Always On Availability Groups

Answer: B

Azure Cosmos DB is designed for globally distributed, low-latency access with automatic multi-region failover. It natively supports document data models (NoSQL API, MongoDB API) with flexible schemas and guarantees single-digit millisecond latency at the 99th percentile.

Question 2

You are migrating an on-premises SQL Server database that uses cross-database queries, SQL Agent jobs, and CLR stored procedures to Azure. You need minimal application changes. Which service should you use?

A. Azure SQL Database (single database)
B. Azure SQL Database (elastic pool)
C. Azure SQL Managed Instance
D. Azure Cosmos DB

Answer: C

Azure SQL Managed Instance provides near-100% compatibility with on-premises SQL Server, including cross-database queries, SQL Server Agent, and CLR integration. Azure SQL Database does not support these features, making Managed Instance the best choice for lift-and-shift migrations.

Question 3

Your Cosmos DB application experiences highly variable traffic with peaks of 50,000 RU/s but typically uses only 5,000 RU/s. You want to avoid paying for peak capacity during quiet periods. What throughput mode should you choose?

A. Provisioned throughput set to 50,000 RU/s
B. Provisioned throughput with autoscale (max 50,000 RU/s)
C. Serverless mode
D. Provisioned throughput set to 5,000 RU/s

Answer: B

Autoscale provisioned throughput automatically scales between 10% and 100% of the maximum RU/s. It would scale between 5,000 and 50,000 RU/s, handling peaks while reducing cost during quiet periods. Serverless is not recommended for sustained workloads exceeding small traffic. Fixed provisioning at peak wastes cost during quiet periods.

Question 4

Your organization needs to consolidate sales data from 15 different source systems for business intelligence reporting. The data volumes are very large (petabytes). Which architecture should you design?

A. Load all data into a single Azure SQL Database
B. Use Azure Synapse Analytics with a dedicated SQL pool for the data warehouse and Azure Data Factory for ETL
C. Store all data in Azure Cosmos DB and query with MongoDB API
D. Use Azure Table Storage with custom reporting

Answer: B

Azure Synapse Analytics with dedicated SQL pools provides massively parallel processing (MPP) designed for petabyte-scale data warehousing. Azure Data Factory handles data ingestion from multiple sources. This is the standard Azure architecture for enterprise-scale BI with large data volumes.

Question 5

You need to track all data access queries on your Azure SQL Database for compliance auditing. The audit logs must be available for analysis using KQL. Where should you configure audit log storage?

A. Azure Blob Storage
B. Azure Event Hubs
C. Log Analytics workspace
D. Azure Table Storage

Answer: C

Sending audit logs to a Log Analytics workspace enables analysis using KQL (Kusto Query Language). While Azure Blob Storage is a valid audit destination, it does not support KQL queries directly. Log Analytics provides powerful querying, alerting, and visualization capabilities for audit data analysis.

AZ-305 Designing Azure Infrastructure Solutions - Table of Contents

Master all exam topics with comprehensive study guides and practice questions.

Design a Solution for Logging and Monitoring Azure Monitor, App Insights, Log Analytics, Sentinel Design Authentication and Authorization Solutions RBAC, Identity Management Design Authentication Azure AD, MFA, Password Reset, AD Sync Design Authorization JIT Access, Azure Resource Graph, AD Groups Design Governance Azure Policy, Template Specs, Deployment Stacks Design a Data Management Strategy DTUs, RUs, Relational/NoSQL, Data Retention Design a Data Protection Strategy Geo-Replication, Encryption, Scaling, DLP Design a Monitoring Strategy for the Data Platform Data Monitoring Strategy Design a Site Recovery Strategy ASR, Failover, Paired Regions, BCDR Design for High Availability Redundancy, Zone-Redundant, HA Storage/SQL Design a Data Archiving Strategy Access Tiers, Archiving, SLAs Design Deployments Compute, Container, Database, Web App Deployments Design Migrations Cloud Adoption Framework, IaaS/PaaS Migration Design an API Integration Strategy API Management, API Policies Design a Storage Strategy Access Tiers, Requirements, Storage Management Design a Compute Strategy Compute Options, HPC, Windows Virtual Desktop Design a Networking Strategy Hub-Spoke, vWAN, DNS, Private Endpoints, Load Balancing

Search Tutorials

AZ-305 - Design a Data Management Strategy

1. Managed vs Unmanaged Data

Managed Data Services (PaaS)

PaaS Database Services

Unmanaged Data (IaaS)

IaaS Database Hosting

2. Relational vs NoSQL Database Strategy

Relational Databases in Azure

Azure SQL Database

Azure SQL Managed Instance

NoSQL Databases in Azure

Azure Cosmos DB

3. Database Auditing and the DTU/RU Concepts

Database Auditing

Azure SQL Database Auditing

DTU (Database Transaction Unit)

Understanding DTUs

DTU vs vCore Purchasing Models

RU/s (Request Units per Second)

Understanding Request Units

Throughput Modes in Cosmos DB

4. Data Retention and Availability Strategy

Data Retention

Retention Tiers and Strategies

Data Availability and Durability

Consistency Models

Cosmos DB Five Consistency Levels

5. Data Warehouse Strategy

Azure Synapse Analytics

Azure Synapse Analytics

Data Warehouse Design Patterns

ETL vs ELT

Data Lake vs Data Warehouse

Modern Data Warehouse Architecture

Key Terms

Exam Tips

Practice Questions

Question 1

Question 2

Question 3

Question 4

Question 5

AZ-305 Designing Azure Infrastructure Solutions - Table of Contents

Popular Posts