DP-100 Designing and Implementing a Data Science Solution on Azure - Practice Test 1
Your Progress
0 / 50
Question 1
EASY
You need to create a managed cloud-based workstation for a data scientist to run Jupyter notebooks in Azure Machine Learning. Which resource should you create?
A compute instance is a managed cloud workstation optimized for machine learning development. It comes pre-installed with Jupyter, JupyterLab, VS Code, and RStudio. A compute cluster is for scalable training jobs. An inference cluster (AKS) is for deploying models. Attached compute is for bringing your own resources.
See more: Design and Prepare a ML Solution
Question 2
EASY
Which Azure resource is automatically created when you provision an Azure Machine Learning workspace?
When you create an Azure ML workspace, four associated resources are automatically provisioned: Azure Storage Account, Azure Key Vault, Azure Container Registry, and Application Insights. Azure Key Vault stores secrets and credentials used by the workspace.
See more: Design and Prepare a ML Solution
Question 3
MEDIUM
You need to point to a specific CSV file stored in Azure Blob Storage and version it for reproducibility. Which Azure ML concept should you use?
A data asset of type uri_file is a versioned reference to a single file (like a CSV). A datastore is just a connection to a storage service, not a versioned reference to specific data. An mltable defines a tabular schema across files. An environment defines software dependencies.
See more: Design and Prepare a ML Solution
Question 4
EASY
What is a datastore in Azure Machine Learning?
A datastore is a connection reference to an Azure storage service (Blob Storage, Data Lake Storage Gen2, Azure SQL Database, etc.). It stores the connection information so you can access data without exposing credentials in your code.
See more: Design and Prepare a ML Solution
Question 5
MEDIUM
You need to train a deep learning model using GPUs. Which Azure VM series should you use for the compute cluster?
The NC-series and ND-series VMs include NVIDIA GPUs and are designed for deep learning, computer vision, and other GPU-intensive workloads. D-series is general purpose, F-series is compute optimized (CPU), and E-series is memory optimized.
See more: Design and Prepare a ML Solution
Question 6
EASY
Which tool in Azure ML allows you to build ML pipelines using a drag-and-drop interface without writing code?
Azure ML Designer is a drag-and-drop interface for building ML pipelines visually. It supports data preparation, model training, and evaluation components. Automated ML automates algorithm selection but is not a pipeline builder. CLI v2 uses YAML files. Notebooks require code.
See more: Explore Data and Train Models
Question 7
MEDIUM
In Azure ML Designer, which component should you use to divide your dataset into training and test sets?
The Split Data component divides a dataset into two parts (e.g., 80/20 for training and testing). It supports random split, percentage split, and stratified sampling. Select Columns filters columns, Clean Missing Data handles null values, and Normalize Data scales features.
See more: Explore Data and Train Models
Question 8
MEDIUM
Which Automated ML primary metric should you use to optimize a binary classification model when the dataset is highly imbalanced?
AUC_weighted is the recommended primary metric for imbalanced classification datasets because it accounts for the area under the ROC curve weighted by class frequency. Accuracy can be misleading with imbalanced data since predicting the majority class yields high accuracy. r2_score is for regression tasks.
See more: Explore Data and Train Models
Question 9
EASY
Which framework does Azure ML use by default for experiment tracking and model logging?
Azure ML uses MLflow as the default tracking and logging framework. MLflow provides APIs for logging parameters, metrics, and model artifacts. Azure ML Studio natively displays MLflow-logged data. TensorBoard can be used additionally but is not the default.
See more: Explore Data and Train Models
Question 10
MEDIUM
You want to automatically find the best algorithm and hyperparameters for a regression problem. Which Azure ML feature should you use?
Automated ML automatically iterates through multiple algorithms and hyperparameters to find the best model. A sweep job only searches hyperparameters for a single specified algorithm. Designer requires manual configuration. Pipeline jobs chain steps but don't auto-select algorithms.
See more: Explore Data and Train Models
Question 11
MEDIUM
In a training script, you use argparse to define parameters. How do you pass different parameter values when submitting the job using the Azure ML SDK v2?
In Azure ML SDK v2, script parameters are passed via the command string using the [null] syntax. The inputs dictionary maps these placeholders to actual values. Environment variables and Key Vault are for secrets, not training parameters.
See more: Prepare a Model for Deployment
Question 12
MEDIUM
You need to search across multiple hyperparameter combinations to find the best model. Which Azure ML job type should you use?
A sweep job performs hyperparameter tuning by running multiple trials with different parameter combinations. It supports random, grid, and Bayesian sampling. A command job runs a single script execution. A pipeline job chains steps. An AutoML job selects algorithms automatically.
See more: Prepare a Model for Deployment
Question 13
EASY
In an Azure ML pipeline, how does data flow from one step to the next?
In Azure ML pipelines, data flows between steps by connecting the output of one step to the input of the next. Azure ML handles serialization, storage, and deserialization automatically. Steps do not communicate via environment variables, databases, or REST calls.
See more: Prepare a Model for Deployment
Question 14
MEDIUM
What is the purpose of publishing a pipeline as a pipeline endpoint?
Publishing a pipeline as a pipeline endpoint makes it callable via a REST API. This enables automation, scheduling, and triggering from external systems. It does not deploy models for inference or share across workspaces.
See more: Prepare a Model for Deployment
Question 15
EASY
Which deployment option is best for deploying a model for real-time inference in production with auto-scaling?
Managed online endpoints provide Azure-managed infrastructure for real-time inference with built-in auto-scaling, blue-green deployments, and monitoring. Batch endpoints are for bulk scoring. Compute instances are for development. Pipeline endpoints trigger training pipelines.
See more: Deploy and Retrain a Model
Question 16
MEDIUM
A scoring script for a deployed model must contain which two functions?
Azure ML scoring scripts must define init() and run(). The init() function is called once when the service starts to load the model. The run() function is called for each inference request to process input and return predictions.
See more: Deploy and Retrain a Model
Question 17
MEDIUM
You want to deploy a new version of a model alongside the current version and gradually shift traffic. Which deployment strategy should you use?
Azure ML managed online endpoints support blue-green deployments. You deploy a new version (green) alongside the current (blue), gradually shift traffic from blue to green, and then remove the old deployment once validated.
See more: Deploy and Retrain a Model
Question 18
EASY
Which data asset type in Azure ML defines a tabular schema with column selection and type casting?
MLTable defines a tabular schema on top of one or more files. It supports column selection, type casting, and data transformations. uri_file points to a single file. uri_folder points to a directory. custom_model is a model asset type.
See more: Design and Prepare a ML Solution
Question 19
HARD
In Azure ML Designer, you need to add custom Python logic to a pipeline that is not available in built-in components. Which component should you use?
The Execute Python Script component in Azure ML Designer allows you to add custom Python code to your pipeline. It accepts DataFrames as input and outputs DataFrames, enabling custom feature engineering, transformations, or model logic.
See more: Explore Data and Train Models
Question 20
MEDIUM
Which evaluation metric measures the proportion of actual positive cases that were correctly identified?
Recall (also called sensitivity or true positive rate) measures the proportion of actual positive cases correctly identified: TP / (TP + FN). Precision measures TP / (TP + FP). F1 is the harmonic mean of precision and recall. Accuracy is overall correct predictions / total predictions.
See more: Explore Data and Train Models
Question 21
HARD
You define an Azure ML compute cluster with min_instances=0 and max_instances=4. What happens when no jobs are running?
With min_instances=0, the cluster scales down to zero nodes after idle_time_before_scale_down expires (default 120 seconds). At zero nodes, you incur no compute charges. The cluster is not deleted; it scales back up when a job is submitted.
See more: Design and Prepare a ML Solution
Question 22
MEDIUM
An Azure ML environment defines which of the following?
An Azure ML environment encapsulates the software dependencies (conda/pip packages) and Docker base image for running training and scoring scripts. Hardware specs are defined by V size on compute targets. Credentials are managed by datastores. Network configuration is at the workspace level.
See more: Prepare a Model for Deployment
Question 23
EASY
What is the recommended way to access Azure ML programmatically using the Python SDK v2?
The Azure ML Python SDK v2 uses the azure-ai-ml package. The primary entry point is MLClient from azure.ai.ml. The azureml.core import was for SDK v1 which is now legacy.
See more: Design and Prepare a ML Solution
Question 24
HARD
You need to deploy a model for processing millions of records overnight. Which endpoint type should you use?
Batch endpoints are designed for asynchronous, high-throughput scoring of large datasets. They process data in parallel and are ideal for overnight or scheduled batch processing. Online endpoints are for real-time, low-latency requests. Pipeline endpoints trigger training pipelines.
See more: Deploy and Retrain a Model
Question 25
MEDIUM
In Automated ML, which option allows you to exclude specific algorithms from the search space?
The blocked algorithms setting in Automated ML lets you exclude specific algorithms from consideration. Exit criteria control timeout and iteration limits. Featurization settings configure preprocessing. Cross-validation folds control model validation.
See more: Explore Data and Train Models
Question 26
EASY
Which Azure service monitors deployed ML web services and collects telemetry for performance analysis?
Application Insights is an associated workspace resource that monitors deployed web services, collecting telemetry data like request rates, response times, and failure rates. Azure Monitor is a broader monitoring service. Key Vault stores secrets. Container Registry stores Docker images.
See more: Design and Prepare a ML Solution
Question 27
MEDIUM
You register a model from a training job. Which model type should you specify if the model was logged using MLflow?
When a model is logged using MLflow (e.g., mlflow.sklearn.log_model()), you should register it with type MLFLOW_MODEL. This enables Azure ML to automatically generate a scoring script and environment for deployment. CUSTOM_MODEL is for models not logged with MLflow.
See more: Deploy and Retrain a Model
Question 28
HARD
In a sweep job, which sampling algorithm uses a probabilistic model to select the next set of hyperparameters based on previous results?
Bayesian sampling builds a probabilistic model of the objective function and uses it to select the most promising hyperparameters for the next trial. Random sampling picks values randomly. Grid sampling exhaustively searches all combinations. Sobol is a quasi-random sequence.
See more: Prepare a Model for Deployment
Question 29
EASY
Which function in the scoring script is called once when the deployed service starts?
The init() function is called once when the service starts. It is used to load the model and any required resources into memory. The run() function is called for each inference request.
See more: Deploy and Retrain a Model
Question 30
MEDIUM
You want to test a managed online endpoint after deployment using the SDK. Which method should you call?
The ml_client.online_endpoints.invoke() method sends a test request to a managed online endpoint. It accepts the endpoint name and a request file or data payload. The other methods listed do not exist in the Azure ML SDK v2.
See more: Deploy and Retrain a Model
Question 31
MEDIUM
Which regression metric represents the proportion of variance in the dependent variable that is predictable from the features?
R-squared (R²) measures the proportion of variance in the target variable explained by the model. An R² of 1.0 means perfect prediction. MAE measures average absolute errors. RMSE penalizes larger errors more heavily. Relative Squared Error is normalized by the total variance.
See more: Explore Data and Train Models
Question 32
EASY
Which Designer component generates predictions on a test dataset using a trained model?
The Score Model component takes a trained model and a test dataset and generates predictions. Train Model creates the trained model. Evaluate Model compares predictions to actual labels to compute metrics. Select Columns filters columns.
See more: Explore Data and Train Models
Question 33
HARD
You need to handle missing values by replacing them with the column median in an Azure ML Designer pipeline. Which component and setting should you use?
The Clean Missing Data component handles missing values. You can configure it to replace missing values with mean, median, mode, a custom value, or remove rows entirely. The "Replace using median" option substitutes nulls with the column median.
See more: Explore Data and Train Models
Question 34
MEDIUM
What is the primary benefit of using Azure ML pipelines over running individual scripts?
Pipelines provide modularity, reproducibility, and reusability. Each step can be independently executed, tracked, and reused across experiments. GPU optimization is a compute feature. Hyperparameter tuning uses sweep jobs. Model deployment uses endpoints.
See more: Prepare a Model for Deployment
Question 35
EASY
How do you authenticate to a managed online endpoint when auth_mode is set to "key"?
When auth_mode is "key", you authenticate by including the endpoint's primary or secondary key in the Authorization header as a Bearer token: "Authorization: Bearer <key>". The key can be retrieved using ml_client.online_endpoints.get_keys().
See more: Deploy and Retrain a Model
Question 36
MEDIUM
Which pandas method provides count, mean, std, min, max, and quartiles for numeric columns?
df.describe() returns descriptive statistics: count, mean, std, min, 25%, 50%, 75%, and max for numeric columns. df.info() shows column types and non-null counts. df.head() shows first rows. df.shape returns row and column counts.
See more: Explore Data and Train Models
Question 37
HARD
You are setting up Automated ML for a text classification task. Which task type should you specify?
For NLP text classification tasks, specify task type as "text_classification". The standard "classification" type is for tabular data. AutoML for NLP supports text_classification, text_classification_multilabel, and text_ner (named entity recognition).
See more: Explore Data and Train Models
Question 38
MEDIUM
In the scoring script, which environment variable provides the path to the deployed model directory?
AZUREML_MODEL_DIR is the environment variable that contains the path to the directory where the model is deployed. In the init() function, you use os.getenv("AZUREML_MODEL_DIR") to locate and load the model file.
See more: Deploy and Retrain a Model
Question 39
EASY
Which Azure ML feature allows you to test a deployed model quickly using a low-cost container for development and testing?
Azure Container Instances (ACI) provides a quick, low-cost way to deploy and test models during development. AKS is for production-scale deployments. Compute clusters are for training. Data Lake Storage is for data storage.
See more: Deploy and Retrain a Model
Question 40
HARD
In a managed online endpoint, you have a blue deployment with 100% traffic and want to test a green deployment with 10% of requests. How do you configure this?
Managed online endpoints have built-in traffic routing. Set the traffic dictionary to allocate percentages: {"blue": 90, "green": 10}. This sends 90% of requests to blue and 10% to green. No external load balancer or separate endpoints are needed.
See more: Deploy and Retrain a Model
Question 41
MEDIUM
Which method do you call to log a parameter value during a training run using MLflow?
mlflow.log_param() logs a single key-value parameter (e.g., learning rate, number of epochs). mlflow.log_metric() logs numeric metrics (e.g., accuracy, loss). mlflow.log_artifact() logs files. mlflow.set_tag() adds metadata tags to the run.
See more: Explore Data and Train Models
Question 42
EASY
Which Azure ML web interface is used for managing workspace resources including datasets, experiments, and models?
Azure ML Studio at ml.azure.com is the purpose-built web interface for managing all workspace resources including data assets, experiments, models, endpoints, and compute. Azure Portal is for general Azure resource management.
See more: Design and Prepare a ML Solution
Question 43
MEDIUM
You need to normalize numeric features to have zero mean and unit variance in Azure ML Designer. Which normalization method should you use?
Z-Score normalization (standardization) transforms data to have zero mean and unit standard deviation: (x - mean) / std. Min-Max scales values to a range [0,1]. Both are available in the Normalize Data component.
See more: Explore Data and Train Models
Question 44
HARD
In Azure ML SDK v2, you use the @pipeline decorator to define a pipeline. What does the default_compute parameter specify?
The default_compute parameter in the @pipeline decorator specifies the compute target used by pipeline steps that don't explicitly define their own compute. Individual steps can override this with their own compute setting.
See more: Prepare a Model for Deployment
Question 45
EASY
Which Automated ML task type should you select for predicting house prices?
Predicting house prices is a regression task because the target variable (price) is a continuous numeric value. Classification predicts categories. Forecasting is for time-series data. Clustering groups similar items.
See more: Explore Data and Train Models
Question 46
MEDIUM
Which scikit-learn function splits a dataset into training and test sets?
train_test_split() from sklearn.model_selection splits arrays or matrices into random train and test subsets. KFold is for k-fold cross-validation iterators. cross_val_score evaluates a model with cross-validation. split_data does not exist in sklearn.
See more: Explore Data and Train Models
Question 47
MEDIUM
You need to delete a compute cluster that is no longer needed. Which SDK method should you use?
ml_client.compute.begin_delete() permanently deletes a compute resource. begin_stop() stops a compute instance but doesn't delete it. The remove() and shutdown() methods don't exist in the SDK v2.
See more: Prepare a Model for Deployment
Question 48
EASY
Which method streams the output of a running job to the console?
ml_client.jobs.stream() streams the console output of a running job in real-time. ml_client.jobs.get() retrieves job metadata and status. The logs() and watch() methods don't exist in SDK v2.
See more: Prepare a Model for Deployment
Question 49
HARD
You want to identify correlations between features in your dataset. Which pandas method should you use?
df.corr() computes pairwise correlation coefficients between numeric columns. Values range from -1 (perfect negative correlation) to +1 (perfect positive correlation). df.describe() gives summary statistics. value_counts() counts unique values. groupby() groups data for aggregation.
See more: Explore Data and Train Models
Question 50
MEDIUM
After retraining a model with new data, what is the recommended deployment approach to ensure zero downtime?
The recommended approach is blue-green deployment: create a new deployment with the updated model, test it with a small percentage of traffic, then gradually shift all traffic to the new deployment. This ensures zero downtime and allows rollback if issues are detected.
See more: Deploy and Retrain a Model
← Back to DP-100 Practice Tests
Popular Posts
1Z0-830 Java SE 21 Developer Certification
1Z0-819 Java SE 11 Developer Certification
1Z0-829 Java SE 17 Developer Certification
AWS AI Practitioner Certification
AZ-204 Azure Developer Associate Certification
AZ-305 Azure Solutions Architect Expert Certification
AZ-400 Azure DevOps Engineer Expert Certification
DP-100 Azure Data Scientist Associate Certification
AZ-900 Azure Fundamentals Certification
PL-300 Power BI Data Analyst Certification
Spring Professional Certification
Azure AI Foundry Hello World
Azure AI Agent Hello World
Foundry vs Hub Projects
Build Agents with SDK
Bing Web Search Agent
Function Calling Agent
Spring Boot + Azure Key Vault Hello World Example
Spring Boot + Elasticsearch + Azure Key Vault Example
Spring Boot Azure AD (Entra ID) OAuth 2.0 Authentication
Deploy Spring Boot App to Azure App Service
Secure Azure App Service using Azure API Management
Deploy Spring Boot JAR to Azure App Service
Deploy Spring Boot + MySQL to Azure App Service
Spring Boot + Azure Managed Identity Example
Secure Spring Boot Azure Web App with Managed Identity + App Registration
Elasticsearch 8 Security - Integrate Azure AD OIDC