Search Tutorials


AZ-305 - Design a Monitoring Strategy for the Data Platform | JavaInUse

AZ-305 - Design a Monitoring Strategy for the Data Platform

1. Data Monitoring Strategy Overview

A comprehensive data platform monitoring strategy ensures visibility into the health, performance, and security of all data services. Azure provides a unified monitoring stack built on Azure Monitor that integrates with each data service to collect metrics, logs, and diagnostics.

Azure Monitor as the Foundation

Azure Monitor is the central service for collecting, analyzing, and acting on telemetry from Azure resources. It aggregates platform metrics and resource logs from data services into a single pane. You can query data using Kusto Query Language (KQL) in Log Analytics workspaces and create dashboards in Azure Workbooks.

Key Monitoring Components

  • Metrics: Numeric values collected at regular intervals that describe resource performance (e.g., DTU percentage, RU consumption).
  • Logs: Structured records of events and operations that provide detailed diagnostic information.
  • Alerts: Automated notifications or actions triggered when metric or log conditions meet specified thresholds.
  • Dashboards: Visual summaries of key metrics and logs displayed in Azure Portal or Grafana.

Diagnostic Settings

Diagnostic settings control where platform logs and metrics are sent. Each Azure data resource can be configured to route diagnostics to one or more destinations:

  • Log Analytics workspace (for KQL queries and alerting)
  • Azure Storage account (for long-term archival)
  • Azure Event Hubs (for streaming to external SIEM tools)
  • Partner solutions (third-party integrations)

2. Monitoring Azure SQL

Azure SQL Database Metrics

Azure SQL Database exposes a rich set of platform metrics through Azure Monitor. Key metrics include CPU percentage, Data IO percentage, DTU percentage (for DTU-based tiers), memory usage, and connection counts. These metrics are available at one-minute granularity and retained for 93 days by default.

Query Performance Insight

Query Performance Insight identifies the top resource-consuming queries in your Azure SQL Database. It combines query store data with Azure Monitor metrics to show which queries are driving CPU, IO, and memory consumption. This helps you optimize query performance without setting up additional monitoring tools.

Intelligent Insights

Intelligent Insights uses built-in intelligence to detect performance anomalies in Azure SQL Database. It analyzes query execution patterns and generates diagnostic logs when it detects degradation such as excessive waits, regression in query plans, or resource limits reached.

Auditing for Azure SQL

Azure SQL auditing tracks database events and writes them to an audit log in Azure Storage, Log Analytics, or Event Hubs. Audit policies can be configured at the server level (applying to all databases) or at the individual database level. Auditing captures events such as successful and failed logins, schema changes, and data access.

Azure SQL MetricDescriptionAlert Threshold Example
CPU PercentagePercentage of allocated CPU consumedGreater than 80% for 5 minutes
DTU PercentagePercentage of DTU capacity usedGreater than 90% for 10 minutes
Data IO PercentagePercentage of data IO consumedGreater than 85% for 5 minutes
DeadlocksNumber of deadlocks detectedGreater than 0
Failed ConnectionsCount of failed connection attemptsGreater than 10 in 5 minutes

3. Monitoring Azure Cosmos DB

Cosmos DB Metrics

Azure Cosmos DB exposes metrics for throughput consumption, storage, availability, and latency. The most important metric is Normalized RU Consumption, which shows how much of the provisioned throughput is being used. A value approaching 100% indicates that requests may be throttled (HTTP 429 responses).

Request Unit (RU) Monitoring

Every Cosmos DB operation consumes Request Units. Monitoring the Total Request Units metric per partition helps identify hot partitions. If one partition consumes disproportionately more RUs, your partition key strategy may need revision. Use Azure Monitor metrics or Cosmos DB Insights workbook to visualize RU consumption by partition.

Cosmos DB Diagnostic Logs

When diagnostic settings are enabled, Cosmos DB can log detailed information about every data-plane request, including the operation type, status code, RU charge, duration, and partition key range. These logs are invaluable for troubleshooting throttled requests and latency spikes.

Cosmos DB Alerts

Recommended alerts for Cosmos DB include: Normalized RU Consumption exceeding 70% (scale warning), server-side latency exceeding your SLA threshold, total request count with status 429 (throttled), and region failover events for multi-region accounts.

4. Monitoring Azure Storage

Storage Analytics

Azure Storage provides Storage Analytics for logging and metrics collection. Storage Analytics Logging records details about successful and failed requests to blob, queue, table, and file services. Metrics include transaction counts, ingress/egress data volumes, availability, and end-to-end latency.

Storage Insights

Azure Monitor Storage Insights provides a unified view of storage account performance, capacity, and availability. It uses Azure Monitor metrics and does not require Storage Analytics to be enabled. Storage Insights is the recommended approach for monitoring storage in new deployments.

Capacity Monitoring

Monitor the Used Capacity metric for each storage service (blob, file, table, queue) to track growth trends and plan for capacity. Set alerts when capacity approaches account limits (current maximum is 5 PiB per storage account for standard accounts).

5. Alerting and Audit Logs

Alert Rules

Azure Monitor supports three types of alert rules for data platform monitoring:

  • Metric alerts: Evaluate metric values at regular intervals and trigger when a condition is met (e.g., CPU above 80%).
  • Log alerts: Run KQL queries against Log Analytics data and trigger based on the number of results or a metric measurement.
  • Activity log alerts: Trigger on specific Azure management-plane operations (e.g., database deleted, firewall rule changed).

Action Groups

Action groups define the notification and remediation actions taken when an alert fires. Actions can include email, SMS, push notifications, voice calls, Azure Functions, Logic Apps, webhooks, and ITSM connectors. A single action group can be reused across multiple alert rules.

Audit Logs for Data Services

Audit logs capture administrative and data-plane operations. For compliance, enable Azure SQL auditing, Cosmos DB diagnostic logging, and Storage Analytics logging. Route all audit data to a centralized Log Analytics workspace for cross-service correlation and long-term retention. Azure Policy can enforce diagnostic settings across all data resources.

Data ServicePrimary Monitoring ToolKey Diagnostic Logs
Azure SQL DatabaseAzure Monitor, Query Performance InsightSQL Auditing, Intelligent Insights
Azure Cosmos DBCosmos DB Insights, Azure MonitorDataPlaneRequests, QueryRuntimeStatistics
Azure StorageStorage Insights, Azure MonitorStorageRead, StorageWrite, StorageDelete
Azure Data LakeAzure MonitorRequests, Filesystem operations

Key Terms

TermDefinition
Diagnostic SettingsConfiguration that routes platform logs and metrics from an Azure resource to one or more destinations (Log Analytics, Storage, Event Hubs).
KQL (Kusto Query Language)Query language used in Log Analytics to analyze log data collected by Azure Monitor.
Normalized RU ConsumptionCosmos DB metric showing the percentage of provisioned throughput being consumed; values near 100% indicate throttling risk.
Query Performance InsightAzure SQL feature that identifies top resource-consuming queries by combining query store data with Azure Monitor metrics.
Intelligent InsightsAzure SQL built-in intelligence that detects and diagnoses performance anomalies such as query regressions and excessive waits.
Action GroupA collection of notification and remediation actions (email, SMS, webhook, Azure Function) triggered when an alert fires.
Storage InsightsAzure Monitor workbook providing a unified view of storage account performance, capacity, and availability without requiring Storage Analytics.

Exam Tips

  • Diagnostic settings are the primary mechanism for routing logs from data services. Know the three destinations: Log Analytics workspace, Storage account, and Event Hubs.
  • For Azure SQL, Query Performance Insight is the built-in tool for identifying resource-heavy queries. Intelligent Insights automatically detects anomalies.
  • Normalized RU Consumption is the critical Cosmos DB metric. When it approaches 100%, requests are throttled with HTTP 429 responses.
  • Storage Insights is the recommended monitoring approach for Azure Storage. Storage Analytics logging is the legacy solution.
  • Action groups are reusable across alert rules. A single action group can contain multiple notification types (email, SMS, webhook).
  • Azure Policy can enforce diagnostic settings across all resources in a subscription, ensuring consistent monitoring coverage.

Practice Questions

Question 1

You need to stream Azure SQL Database audit logs to a third-party SIEM system in real-time. Where should you send the diagnostic logs?

A. Azure Storage account

B. Log Analytics workspace

C. Azure Event Hubs

D. Azure Blob Storage with lifecycle management

Answer: C

Azure Event Hubs is designed for real-time streaming of diagnostic logs to external systems such as SIEM tools. Storage accounts are suitable for archival, and Log Analytics is for Azure-native querying and alerting.

Question 2

Your Cosmos DB application is experiencing intermittent HTTP 429 responses. Which metric should you monitor to diagnose this issue?

A. Total Request Units

B. Normalized RU Consumption

C. Data Storage Size

D. Available Storage

Answer: B

Normalized RU Consumption shows the percentage of provisioned throughput being used across all partition key ranges. When this value approaches 100%, the service begins throttling requests with HTTP 429 responses. Total Request Units shows aggregate consumption but does not indicate how close you are to the provisioned limit.

Question 3

You want to identify the most resource-consuming queries in your Azure SQL Database without deploying additional monitoring tools. Which built-in feature should you use?

A. Dynamic Management Views (DMVs)

B. Query Performance Insight

C. Azure Advisor

D. SQL Profiler

Answer: B

Query Performance Insight is a built-in Azure SQL Database feature that combines query store data with Azure Monitor metrics to identify top resource-consuming queries. It requires no additional tooling. SQL Profiler is a legacy on-premises tool not available for Azure SQL Database.

Question 4

You need to ensure that all Azure data resources in your subscription have diagnostic settings configured to send logs to a Log Analytics workspace. What should you use?

A. Azure Monitor alert rules

B. Azure Policy with DeployIfNotExists effect

C. Azure Automation runbooks

D. Resource Manager template deployment

Answer: B

Azure Policy with the DeployIfNotExists effect can automatically deploy diagnostic settings to resources that lack them. This ensures consistent monitoring coverage across the subscription without manual intervention.

Question 5

You want to trigger an Azure Function automatically when the CPU percentage of your Azure SQL Database exceeds 90% for more than 5 minutes. What do you need to configure?

A. A metric alert rule with an action group that includes the Azure Function

B. A log alert rule querying the Activity Log

C. An Event Grid subscription on the SQL Database

D. A Storage Queue triggered by diagnostic logs

Answer: A

A metric alert rule evaluates the CPU percentage metric at regular intervals. When the condition is met (greater than 90% for 5 minutes), it triggers an action group that includes the Azure Function as an action. Log alerts query log data, not real-time metrics.

AZ-305 Designing Azure Infrastructure Solutions - Table of Contents

Master all exam topics with comprehensive study guides and practice questions.


Popular Posts