Search Tutorials


AZ-305 - Design a Data Management Strategy | JavaInUse

AZ-305 - Design a Data Management Strategy

1. Managed vs Unmanaged Data

When designing a data management strategy, the first decision is whether to use managed (PaaS) or unmanaged (IaaS) data services. This choice impacts operational overhead, scalability, security responsibilities, and cost.

Managed Data Services (PaaS)

PaaS Database Services

Managed data services like Azure SQL Database, Azure Cosmos DB, and Azure Database for PostgreSQL/MySQL handle infrastructure management including patching, backups, high availability, and scaling. You focus on schema design, queries, and application logic. The cloud provider manages the underlying VMs, storage, and networking.

Unmanaged Data (IaaS)

IaaS Database Hosting

Running databases on Azure VMs (e.g., SQL Server on Azure VM) gives you full control over the database engine, OS configuration, and instance settings. This is ideal when you need specific database engine features, custom configurations, or must run unsupported database versions. However, you are responsible for patching, backups, HA configuration, and disaster recovery.

AspectManaged (PaaS)Unmanaged (IaaS)
Infrastructure ManagementHandled by AzureYour responsibility
Patching and UpdatesAutomaticManual
BackupsBuilt-in automatedMust configure manually
High AvailabilityBuilt-in SLAMust configure (Always On, clustering)
CustomizationLimited to service featuresFull OS and engine control
Cost ModelPay for service tier/usagePay for VM, storage, licensing

2. Relational vs NoSQL Database Strategy

Choosing between relational and NoSQL databases depends on your data structure, query patterns, scalability requirements, and consistency needs. Azure offers strong options in both categories.

Relational Databases in Azure

Azure SQL Database

Azure SQL Database is a fully managed PaaS relational database engine based on the latest stable SQL Server engine. Key features: automatic tuning, intelligent performance insights, built-in HA with 99.99% SLA, geo-replication for disaster recovery, elastic pools for multi-tenant scenarios, and serverless compute tier for intermittent workloads.

Azure SQL Managed Instance

Azure SQL Managed Instance provides near-100% compatibility with the on-premises SQL Server engine. It supports features that Azure SQL Database does not, including cross-database queries, SQL Server Agent, CLR integration, Service Broker, and Database Mail. It is ideal for lift-and-shift migration of on-premises SQL Server databases with minimal application changes.

NoSQL Databases in Azure

Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model NoSQL database. It supports multiple APIs: NoSQL (document), MongoDB, Cassandra, Gremlin (graph), and Table. Key differentiators: single-digit millisecond latency at the 99th percentile, five consistency levels (strong, bounded staleness, session, consistent prefix, eventual), automatic and instant global distribution, and guaranteed 99.999% availability for multi-region accounts.

FeatureRelational (Azure SQL)NoSQL (Cosmos DB)
Data ModelStructured, schema-enforced tablesFlexible schema: documents, key-value, graph, column-family
Query LanguageT-SQLSQL-like API, MongoDB queries, Gremlin, CQL
ScalingVertical (scale up) + read replicasHorizontal (partition-based, global distribution)
ConsistencyStrong (ACID transactions)Tunable (5 consistency levels)
Best ForStructured data, complex joins, transactionsHigh-scale, low-latency, globally distributed applications

3. Database Auditing and the DTU/RU Concepts

Database Auditing

Azure SQL Database Auditing

Auditing tracks database events and writes them to an audit log in Azure Storage, Log Analytics, or Event Hubs. It records operations such as data queries (SELECT), data changes (INSERT, UPDATE, DELETE), schema changes, authentication events, and permission changes. Auditing can be configured at the server level (applies to all databases) or at the individual database level.

DTU (Database Transaction Unit)

Understanding DTUs

A DTU is a blended measure of CPU, memory, and I/O resources. Azure SQL Database offers DTU-based purchasing with predefined bundles: Basic (5 DTUs, light workloads), Standard (10-3000 DTUs, moderate workloads), Premium (125-4000 DTUs, high-performance workloads). DTUs simplify pricing but do not allow independent scaling of CPU, memory, or storage. For independent resource scaling, use the vCore-based model.

DTU vs vCore Purchasing Models

AspectDTU ModelvCore Model
ResourcesBundled (CPU + memory + I/O)Independent (choose vCores, memory, storage)
PricingSimpler, fixed tiersMore flexible, can use Azure Hybrid Benefit
Use CaseSimple workloads, easy to understandPredictable workloads needing specific resource configurations
Reserved CapacityNot availableAvailable (1 or 3-year reservations)
ServerlessNot availableAvailable (auto-pause, per-second billing)

RU/s (Request Units per Second)

Understanding Request Units

In Azure Cosmos DB, throughput is measured in Request Units per second (RU/s). A single RU represents the cost of reading a 1 KB item by its ID and partition key. Write operations cost approximately 5-10 RU/s per 1 KB item. Complex queries cost more RU/s based on the number of items scanned and the complexity of operations. RU/s can be provisioned at the database or container level.

Throughput Modes in Cosmos DB

Provisioned throughput (manual) - You set a specific RU/s value; best for steady workloads. Provisioned throughput (autoscale) - Scales between 10% and 100% of the maximum RU/s you set; best for variable workloads. Serverless - Pay per RU consumed with no minimum; best for low-traffic or development workloads.

4. Data Retention and Availability Strategy

Data retention strategy defines how long data is stored and in what tier. Data availability, consistency, and durability address how data is protected and accessed across regions and failure scenarios.

Data Retention

Retention Tiers and Strategies

Hot data - Frequently accessed; stored in primary database or hot storage tier. Cool data - Infrequently accessed; moved to cool storage or archived tables. Archive data - Rarely accessed; stored in archive storage for compliance at the lowest cost. Implement retention policies with automated data lifecycle management rules to move data between tiers based on age, access patterns, or compliance requirements.

Data Availability and Durability

StrategyAzure SQLCosmos DBAzure Storage
Local RedundancyWithin datacenter replicas4 replicas within regionLRS: 3 copies in one datacenter
Zone RedundancyZone-redundant configurationAvailability zone supportZRS: 3 copies across 3 zones
Geo-RedundancyActive geo-replication, failover groupsMulti-region writesGRS/GZRS: copies to paired region
SLAUp to 99.995% (with zone redundancy)99.999% (multi-region)Up to 99.99% (RA-GZRS)

Consistency Models

Cosmos DB Five Consistency Levels

From strongest to weakest: Strong - reads always return the most recent committed write (highest latency, lowest throughput). Bounded Staleness - reads lag behind writes by at most K versions or T time. Session - consistency within a client session; default and most popular. Consistent Prefix - reads never see out-of-order writes. Eventual - no ordering guarantee; lowest latency, highest throughput.

5. Data Warehouse Strategy

A data warehouse consolidates data from multiple sources for analytical reporting and business intelligence. Azure provides purpose-built services for enterprise-scale analytics.

Azure Synapse Analytics

Azure Synapse Analytics

Azure Synapse Analytics is an enterprise analytics service that brings together big data and data warehousing. It provides: Dedicated SQL pools - provisioned data warehouse with MPP (massively parallel processing) for complex analytical queries. Serverless SQL pools - query data in place in Azure Data Lake without loading it. Spark pools - Apache Spark for big data processing and machine learning. Synapse Pipelines - data integration and ETL/ELT orchestration.

Data Warehouse Design Patterns

ETL vs ELT

ETL (Extract, Transform, Load) - Data is extracted from source systems, transformed in a staging area (e.g., Azure Data Factory), and then loaded into the data warehouse. Best when transformations are complex and source data volumes are moderate. ELT (Extract, Load, Transform) - Data is loaded into the data warehouse first, then transformed using the warehouse engine (Synapse SQL). Best for large data volumes where the MPP engine can handle transformation efficiently.

Data Lake vs Data Warehouse

AspectData LakeData Warehouse
Data TypeRaw, semi-structured, unstructuredStructured, curated, schema-on-write
SchemaSchema-on-readSchema-on-write (predefined schema)
ProcessingBatch and streamingBatch (optimized for complex queries)
UsersData engineers, data scientistsBusiness analysts, reporting tools
Azure ServiceAzure Data Lake Storage Gen2Azure Synapse Analytics (Dedicated SQL Pool)

Modern Data Warehouse Architecture

A modern data warehouse on Azure typically combines Azure Data Lake Storage Gen2 for raw data storage, Azure Data Factory or Synapse Pipelines for data ingestion and orchestration, Azure Synapse Analytics for data warehousing and analytics, and Power BI for reporting and visualization. This architecture supports both structured BI reporting and exploratory big data analytics in a unified platform.

Key Terms

TermDefinition
DTUDatabase Transaction Unit - a blended measure of CPU, memory, and I/O for Azure SQL Database pricing.
RU/sRequest Units per second - throughput metric for Azure Cosmos DB representing the cost of database operations.
vCoreVirtual core purchasing model for Azure SQL that allows independent scaling of compute, memory, and storage.
Elastic PoolShared DTU or vCore resources across multiple Azure SQL databases for cost-effective multi-tenancy.
Cosmos DBGlobally distributed, multi-model NoSQL database with tunable consistency and single-digit millisecond latency.
Azure Synapse AnalyticsEnterprise analytics service combining data warehousing, big data processing, and data integration.
Data LakeStorage repository for raw data in native format; schema-on-read approach for flexible analytics.
Geo-replicationAsynchronous replication of data to a secondary region for disaster recovery and read scaling.

Exam Tips

  • Know when to use PaaS vs IaaS: Use Azure SQL Database (PaaS) for new applications. Use SQL Server on Azure VM (IaaS) when you need full engine control or features not available in PaaS. Use SQL Managed Instance for lift-and-shift migrations.
  • Understand DTU vs vCore: DTU bundles resources in fixed tiers (simpler). vCore allows independent resource scaling and supports Azure Hybrid Benefit and reserved capacity (more flexible and cost-effective for predictable workloads).
  • Cosmos DB consistency levels: Strong provides reads of the latest write but with higher latency. Session is the default and most commonly used. Eventual provides the lowest latency. Know the tradeoffs for exam scenarios.
  • RU/s planning for Cosmos DB: A point read of a 1 KB item costs 1 RU. Writes cost 5-10x more. Complex queries cost more based on items scanned. Use autoscale for variable workloads, serverless for low/dev traffic.
  • Data warehouse vs data lake: Data warehouse uses schema-on-write for structured BI queries. Data lake uses schema-on-read for flexible, large-scale analytics on raw data. Modern architectures combine both.
  • Auditing supports multiple destinations: Azure SQL auditing can write to Azure Storage, Log Analytics, and Event Hubs. Server-level auditing applies to all databases; database-level auditing applies to specific databases.

Practice Questions

Question 1

Your application needs a globally distributed database with single-digit millisecond reads and automatic multi-region failover. The data model is document-based with flexible schemas. Which service should you choose?

A. Azure SQL Database with geo-replication
B. Azure Cosmos DB with multi-region writes
C. Azure SQL Managed Instance with failover groups
D. SQL Server on Azure VM with Always On Availability Groups

Answer: B

Azure Cosmos DB is designed for globally distributed, low-latency access with automatic multi-region failover. It natively supports document data models (NoSQL API, MongoDB API) with flexible schemas and guarantees single-digit millisecond latency at the 99th percentile.

Question 2

You are migrating an on-premises SQL Server database that uses cross-database queries, SQL Agent jobs, and CLR stored procedures to Azure. You need minimal application changes. Which service should you use?

A. Azure SQL Database (single database)
B. Azure SQL Database (elastic pool)
C. Azure SQL Managed Instance
D. Azure Cosmos DB

Answer: C

Azure SQL Managed Instance provides near-100% compatibility with on-premises SQL Server, including cross-database queries, SQL Server Agent, and CLR integration. Azure SQL Database does not support these features, making Managed Instance the best choice for lift-and-shift migrations.

Question 3

Your Cosmos DB application experiences highly variable traffic with peaks of 50,000 RU/s but typically uses only 5,000 RU/s. You want to avoid paying for peak capacity during quiet periods. What throughput mode should you choose?

A. Provisioned throughput set to 50,000 RU/s
B. Provisioned throughput with autoscale (max 50,000 RU/s)
C. Serverless mode
D. Provisioned throughput set to 5,000 RU/s

Answer: B

Autoscale provisioned throughput automatically scales between 10% and 100% of the maximum RU/s. It would scale between 5,000 and 50,000 RU/s, handling peaks while reducing cost during quiet periods. Serverless is not recommended for sustained workloads exceeding small traffic. Fixed provisioning at peak wastes cost during quiet periods.

Question 4

Your organization needs to consolidate sales data from 15 different source systems for business intelligence reporting. The data volumes are very large (petabytes). Which architecture should you design?

A. Load all data into a single Azure SQL Database
B. Use Azure Synapse Analytics with a dedicated SQL pool for the data warehouse and Azure Data Factory for ETL
C. Store all data in Azure Cosmos DB and query with MongoDB API
D. Use Azure Table Storage with custom reporting

Answer: B

Azure Synapse Analytics with dedicated SQL pools provides massively parallel processing (MPP) designed for petabyte-scale data warehousing. Azure Data Factory handles data ingestion from multiple sources. This is the standard Azure architecture for enterprise-scale BI with large data volumes.

Question 5

You need to track all data access queries on your Azure SQL Database for compliance auditing. The audit logs must be available for analysis using KQL. Where should you configure audit log storage?

A. Azure Blob Storage
B. Azure Event Hubs
C. Log Analytics workspace
D. Azure Table Storage

Answer: C

Sending audit logs to a Log Analytics workspace enables analysis using KQL (Kusto Query Language). While Azure Blob Storage is a valid audit destination, it does not support KQL queries directly. Log Analytics provides powerful querying, alerting, and visualization capabilities for audit data analysis.

AZ-305 Designing Azure Infrastructure Solutions - Table of Contents

Master all exam topics with comprehensive study guides and practice questions.


Popular Posts