Search Tutorials


AWS AI Practitioner - Amazon SageMaker - Deep Dive | JavaInUse

AWS AI Practitioner - Amazon SageMaker - Deep Dive

What is Amazon SageMaker?

Amazon SageMaker is a fully managed, end-to-end machine learning platform that enables developers and data scientists to build, train, tune, and deploy ML models without managing underlying infrastructure. It consolidates every stage of the ML lifecycle into one service.

Core Functions

  • Collect and prepare training data
  • Build and train ML models at scale
  • Automatically tune hyperparameters
  • Deploy models with one click
  • Monitor model performance in production

Workflow Example

Description: Predict a student's exam score based on historical data

Input Features:

  • Years of IT experience
  • Years of AWS experience
  • Hours spent studying

Output: Predicted exam score (e.g., 906)

Flow: Historical data -> Feature engineering -> SageMaker training -> Deployed model -> Real-time predictions

Built In Algorithms

Note: You do not need to memorize these for the exam. They illustrate SageMaker's breadth.

Supervised:

  • Linear regression
  • Classification (KNN)

Unsupervised:

  • PCA - reduce feature dimensions
  • K-means - find data clusters
  • Anomaly detection - flag outliers for fraud detection

Text And Nlp:

  • Natural language processing
  • Text summarization

Image Processing:

  • Image classification
  • Object detection

Key Terms

TermDefinition
Amazon SageMakerA fully managed ML platform covering the entire lifecycle: data preparation, model training, hyperparameter tuning, deployment, and monitoring -- all from a single unified service.
SageMaker StudioThe unified web-based IDE for SageMaker. It provides a single interface for end-to-end ML development: notebooks, experiments, pipelines, model registry, and deployment tools.
Exam Tips:
  • SageMaker = one-stop ML platform. Know the full lifecycle: prepare -> train -> tune -> deploy -> monitor.
  • SageMaker AI (in the console) is the correct service for building/training/deploying models. The plain 'SageMaker' listing in the console is a higher-level entry point.
  • SageMaker Studio is the unified UI and is the main interface for all SageMaker capabilities.

Practice Questions

Q1. A data science team wants a single platform to prepare data, train models, tune hyperparameters, and deploy predictions without managing servers. Which AWS service is the best fit?

  • AWS Lambda -- for serverless compute
  • Amazon SageMaker -- for end-to-end ML lifecycle management
  • Amazon EC2 with GPU instances -- for custom ML workloads
  • Amazon Bedrock -- for foundation model access

Answer: B

Amazon SageMaker is the fully managed end-to-end ML platform that handles every stage of the ML lifecycle -- data preparation, training, tuning, deployment, and monitoring -- without requiring teams to provision or manage servers.

Q2. What is SageMaker Studio?

  • A command-line interface for SageMaker
  • The unified web-based IDE for end-to-end ML development
  • A standalone data visualization tool
  • A mobile app for monitoring models

Answer: B

SageMaker Studio is the unified web-based IDE that provides a single interface for all SageMaker capabilities including notebooks, experiments, pipelines, model registry, and deployment tools.

Q3. Which of the following is NOT a core function of Amazon SageMaker?

  • Collect and prepare training data
  • Build and train ML models at scale
  • Manage relational databases
  • Deploy models with one click

Answer: C

SageMaker handles ML lifecycle functions: data preparation, model training, hyperparameter tuning, deployment, and monitoring. Managing relational databases is handled by Amazon RDS, not SageMaker.

Q4. What types of built-in algorithms does SageMaker provide?

  • Only supervised learning algorithms
  • Only unsupervised learning algorithms
  • Supervised, unsupervised, NLP, and image processing algorithms
  • Only deep learning neural networks

Answer: C

SageMaker provides built-in algorithms across multiple categories: supervised learning (linear regression, KNN), unsupervised learning (PCA, K-means, anomaly detection), text/NLP (natural language processing, text summarization), and image processing (image classification, object detection).

Q5. A company wants to predict product demand based on historical sales, weather data, and marketing spend. Which AWS service should they use to build and deploy this ML model?

  • Amazon Comprehend
  • Amazon Rekognition
  • Amazon SageMaker
  • Amazon Translate

Answer: C

Amazon SageMaker is the appropriate choice for building custom ML models from structured historical data. The other services are pre-built AI services for specific tasks (NLP, image analysis, translation), not custom model development.

Automatic Model Tuning (AMT)

Automatic Model Tuning (AMT) removes the manual, trial-and-error process of finding the best hyperparameters for a model. You define what you want to optimize; SageMaker handles the search automatically.

How It Works

  • You define the objective metric (e.g., maximize accuracy, minimize loss)
  • AMT automatically selects hyperparameter ranges to explore
  • AMT chooses a search strategy (random, Bayesian, Hyperband)
  • AMT determines how long to run the tuning job
  • Early stopping terminates unpromising configurations to save cost

Benefits

  • Saves engineering time -- no manual grid search needed
  • Reduces wasted compute cost via early stop conditions
  • Finds better-performing configurations than manual tuning

Key Terms

TermDefinition
Automatic Model Tuning (AMT)A SageMaker feature that automatically searches for the optimal combination of hyperparameters to maximize or minimize a defined objective metric.
HyperparameterA configuration value set before training begins that controls how the training algorithm behaves -- e.g., learning rate, number of layers, batch size.
Objective MetricThe metric AMT tries to optimize -- for example, validation accuracy or loss. You define this goal; AMT navigates the hyperparameter space to achieve it.
Early Stop ConditionA rule in AMT that terminates a tuning job automatically if it is not improving toward the objective metric, preventing wasted compute spend.
Exam Tips:
  • AMT = automated hyperparameter tuning. You set the goal; SageMaker finds the best settings.
  • Key benefit: saves time AND money through early stop conditions.
  • If the exam describes 'trying different parameter combinations to optimize model performance' -> AMT.

Practice Questions

Q1. A team is training a classification model and wants to automatically find the best combination of learning rate, batch size, and number of epochs to maximize validation accuracy -- without running hundreds of experiments manually. Which SageMaker feature should they use?

  • SageMaker Clarify -- for model explainability
  • SageMaker Automatic Model Tuning -- to automatically search hyperparameter combinations
  • SageMaker Pipelines -- to orchestrate the training workflow
  • SageMaker Feature Store -- to store engineered features

Answer: B

Automatic Model Tuning (AMT) is designed exactly for this use case. The team defines the objective metric (maximize validation accuracy) and the hyperparameter ranges, and AMT automatically runs experiments to find the optimal configuration -- with early stopping to avoid wasted compute.

Q2. What is a hyperparameter in machine learning?

  • The output prediction of a trained model
  • A configuration value set before training that controls how the algorithm behaves
  • The training data used to build the model
  • The accuracy metric after model evaluation

Answer: B

A hyperparameter is a configuration value set before training begins that controls how the training algorithm behaves -- examples include learning rate, number of layers, and batch size. Unlike model parameters, hyperparameters are not learned from data.

Q3. How does AMT's early stop condition help reduce costs?

  • By limiting the number of features used in training
  • By terminating unpromising tuning jobs automatically before they complete
  • By reducing the size of the training dataset
  • By using spot instances for all experiments

Answer: B

Early stop conditions in AMT automatically terminate tuning jobs that are not improving toward the objective metric. This prevents wasted compute spend on configurations that clearly won't produce optimal results.

Q4. Which search strategies does SageMaker AMT support for exploring hyperparameter space?

  • Only random search
  • Only grid search
  • Random, Bayesian, and Hyperband strategies
  • Only manual configuration

Answer: C

SageMaker AMT supports multiple search strategies: random search, Bayesian optimization, and Hyperband. AMT automatically selects the appropriate strategy based on the tuning job configuration.

Q5. What is the 'objective metric' in Automatic Model Tuning?

  • The algorithm used for training
  • The metric AMT tries to optimize, such as accuracy or loss
  • The number of training epochs
  • The size of the training dataset

Answer: B

The objective metric is the metric AMT tries to maximize or minimize -- for example, validation accuracy or loss. You define this goal, and AMT navigates the hyperparameter space to achieve it.

Deployment & Inference Options

Once a model is trained, SageMaker offers four inference deployment patterns depending on latency requirements, payload size, and volume needs. Choosing the right option is a key exam topic.

Comparison Table

Real-Time Inference

Latency: Low (milliseconds)

Payload Size: Up to 6 MB

Processing Time: Up to 60 seconds

Request Volume: Single record per request

Scaling: Manual auto-scaling configuration required

Storage: Direct response (no S3 staging)

Use Case: Live predictions -- fraud checks, product recommendations, real-time scoring

Key Exam Keyword: Real-time, immediate response, low latency

Serverless Inference

Latency: Low (but possible cold start delay)

Payload Size: Up to 6 MB

Processing Time: Up to 60 seconds

Request Volume: Single record per request

Scaling: Automatic -- no configuration needed

Storage: Direct response (no S3 staging)

Use Case: Sporadic or unpredictable traffic -- no infrastructure to manage

Key Exam Keyword: No infrastructure management, serverless, cold start trade-off

Asynchronous Inference

Latency: Near-real-time (minutes)

Payload Size: Up to 1 GB

Processing Time: Up to 1 hour

Request Volume: Single large payload per request

Scaling: Automatic via queue

Storage: Input/output via Amazon S3

Use Case: Large payloads that require extended processing -- video analysis, large documents

Key Exam Keyword: Near-real-time, large payload, queue-based, S3 staging

Batch Transform

Latency: High (minutes to hours)

Payload Size: 100 MB per mini-batch (unlimited total)

Processing Time: Up to 1 hour

Request Volume: Entire dataset (many records at once)

Scaling: Concurrent mini-batch processing

Storage: Input/output via Amazon S3

Use Case: Offline predictions on entire datasets -- monthly scoring, bulk processing

Key Exam Keyword: Batch, entire dataset, high latency acceptable, S3

A cold start in serverless inference occurs when the endpoint has been idle and must spin up compute resources before processing the first request. This adds a few seconds of latency to the initial call.

Decision Guide

Real Time
Need the answer immediately + small payload -> Real-Time
Serverless
Need the answer immediately + don't want to manage infrastructure -> Serverless
Asynchronous
Large payload (up to 1 GB) + can wait minutes -> Asynchronous
Batch
Processing an entire dataset at once + latency doesn't matter -> Batch Transform

Key Terms

TermDefinition
Real-Time Inference (SageMaker)A SageMaker deployment mode that provides immediate, low-latency responses for individual predictions. Requires manual auto-scaling configuration. Payload limit: 6 MB.
Serverless Inference (SageMaker)A SageMaker deployment mode with automatic scaling and no infrastructure management. Suited for sporadic traffic, but may incur a cold start delay after idle periods.
Asynchronous Inference (SageMaker)A SageMaker deployment mode for large payloads (up to 1 GB) requiring longer processing. Requests are queued via S3; results are returned asynchronously to another S3 location.
Batch Transform (SageMaker)A SageMaker deployment mode for running predictions across an entire dataset stored in S3. Processes multiple records concurrently; high latency but high throughput.
Cold Start (Serverless)The initial latency overhead when a serverless endpoint spins up from an idle state. Affects only the first request after a period of inactivity.
Exam Tips:
  • Four inference types: Real-Time, Serverless, Asynchronous, Batch. Know the key differentiators for each.
  • Real-Time vs. Serverless: both are low latency for small payloads. Difference = Serverless has no infra management but has cold start risk.
  • Asynchronous = near-real-time keyword + large payload (up to 1 GB) + S3 staging.
  • Batch Transform = entire DATASET, not a single record. Multiple records processed together.
  • Serverless = 'no infrastructure to manage' is the exam signal.
  • Asynchronous and Batch both use S3 for input and output.

Practice Questions

Q1. A company runs daily risk scoring on a dataset of 500,000 customer records stored in S3. Results are needed within a few hours and do not require real-time responses. Which SageMaker inference type is MOST appropriate?

  • Real-Time Inference -- for immediate scoring of each record
  • Serverless Inference -- to avoid managing infrastructure
  • Asynchronous Inference -- for large single-payload processing
  • Batch Transform -- for processing an entire dataset offline with results written to S3

Answer: D

Batch Transform is designed for offline predictions across entire datasets. It processes multiple records concurrently from S3, writes results back to S3, and is ideal for scheduled, non-time-critical bulk processing like daily risk scoring.

Q2. A media company needs to run content analysis on video files that can be up to 800 MB each. Processing each file can take up to 45 minutes. Which inference type should they use?

  • Real-Time Inference -- for low-latency video processing
  • Serverless Inference -- to scale automatically for video workloads
  • Asynchronous Inference -- for large payloads requiring extended processing time with near-real-time results
  • Batch Transform -- to process all videos in one job

Answer: C

Asynchronous Inference supports payloads up to 1 GB and processing times up to 1 hour, making it ideal for large individual files like 800 MB videos with 45-minute processing requirements. The job is queued via S3 and results are written asynchronously.

Q3. What is a 'cold start' in serverless inference?

  • When the model fails to load
  • Initial latency when the endpoint spins up from an idle state
  • When the training data is not cached
  • When the model makes incorrect predictions

Answer: B

A cold start occurs when a serverless endpoint has been idle and must spin up compute resources before processing the first request. This adds a few seconds of latency to the initial call after a period of inactivity.

Q4. A financial services company needs instant fraud detection for each credit card transaction. Which SageMaker inference type is MOST appropriate?

  • Batch Transform
  • Asynchronous Inference
  • Real-Time Inference
  • No inference needed -- use a rule-based system

Answer: C

Real-Time Inference provides immediate, low-latency responses for individual predictions -- exactly what's needed for live fraud detection where each transaction must be scored in milliseconds before approval.

Q5. Which SageMaker inference type should you choose if you don't want to manage any infrastructure and have sporadic, unpredictable traffic?

  • Real-Time Inference with manual auto-scaling
  • Serverless Inference with automatic scaling
  • Batch Transform
  • Asynchronous Inference

Answer: B

Serverless Inference requires no infrastructure management and scales automatically. It's ideal for sporadic or unpredictable traffic patterns, though it may incur cold start delays after idle periods.

Data Preparation -- Data Wrangler & Feature Store

Data Wrangler

Overview: SageMaker Data Wrangler is an integrated data preparation tool within SageMaker Studio for transforming tabular and image data into ML-ready features -- without writing custom ETL code.

Capabilities:

  • Import data from Amazon S3, Redshift, Athena, and other sources
  • Preview, explore, and visualize data with graphs and statistics
  • Cleanse data: handle missing values, fix data types, remove duplicates
  • Transform data: apply functions, encode categoricals, normalize values
  • Perform feature engineering: derive new columns from existing ones
  • Quick Model analysis: run a fast model to evaluate if features are predictive
  • SQL support: query and transform data using SQL syntax
  • Data quality checks: detect missing values, format errors, and statistical anomalies
  • Export data flows to S3 or directly to SageMaker Pipelines for automation

Exam Signal: Whenever the scenario involves transforming or preparing data for ML -> think Data Wrangler

Feature Store

Overview: SageMaker Feature Store is a centralized repository for storing, managing, discovering, and reusing ML features across teams and models.

Features (engineered data columns) take significant effort to create. Without a central store, different teams recreate the same features independently, leading to inconsistency and wasted work.

Capabilities:

  • Store features from Data Wrangler or any other source
  • Define feature transformations directly in the store
  • Make features discoverable and shareable across teams
  • Provide consistent features for both training and real-time inference
  • Access features from within SageMaker Studio

Feature Engineering Example

Raw
Birth date column (string)
Engineered
Age column (integer) -- more useful as a numeric ML feature

Key Terms

TermDefinition
SageMaker Data WranglerAn integrated data preparation tool in SageMaker Studio for importing, visualizing, cleansing, transforming, and engineering features from raw data -- without writing custom ETL code.
Feature EngineeringThe process of transforming raw data into more informative ML inputs. Example: converting a birth date string into a numeric age value that a model can learn from more effectively.
SageMaker Feature StoreA centralized, managed repository in SageMaker for storing, discovering, sharing, and reusing ML features across models and teams -- ensuring consistency between training and inference.
Exam Tips:
  • Data Wrangler = prepare and transform data. Feature Store = store and reuse those features.
  • 'Transform data before ML training' scenario -> Data Wrangler.
  • 'Centralized place to store and reuse features across teams' scenario -> Feature Store.
  • Data Wrangler can export directly to Feature Store or SageMaker Pipelines.

Practice Questions

Q1. A data engineering team wants to build a shared library of ML features (customer age, purchase frequency, lifetime value) so multiple model teams can reuse them consistently without re-deriving them independently. Which SageMaker component is designed for this?

  • SageMaker Data Wrangler -- to transform features from raw data
  • SageMaker Feature Store -- to centralize, share, and reuse features across teams and models
  • SageMaker Pipelines -- to automate the feature creation workflow
  • SageMaker Model Registry -- to track versions of each feature set

Answer: B

SageMaker Feature Store is a centralized repository for storing engineered ML features, making them discoverable and reusable across different model teams. It ensures training and inference use the same feature definitions, preventing inconsistency.

Q2. What is feature engineering?

  • The process of selecting the best ML algorithm
  • Transforming raw data into more informative ML inputs
  • Deploying a model to production
  • Evaluating model accuracy

Answer: B

Feature engineering is the process of transforming raw data into more informative ML inputs. Example: converting a birth date string into a numeric age value that a model can learn from more effectively.

Q3. Which SageMaker tool allows you to import, visualize, cleanse, and transform data without writing custom ETL code?

  • SageMaker Feature Store
  • SageMaker Data Wrangler
  • SageMaker Model Monitor
  • SageMaker Ground Truth

Answer: B

SageMaker Data Wrangler is an integrated data preparation tool within SageMaker Studio for transforming tabular and image data into ML-ready features -- without writing custom ETL code.

Q4. Where can Data Wrangler export prepared data flows to?

  • Only to Amazon S3
  • To S3 or directly to SageMaker Pipelines for automation
  • Only to SageMaker Feature Store
  • Only to a local file system

Answer: B

Data Wrangler can export data flows to Amazon S3 or directly to SageMaker Pipelines for automation. It can also export features to Feature Store for sharing and reuse.

Q5. A data scientist notices that birth date is stored as a string in the training data, but the model would benefit from having the person's age as a numeric value. Which process addresses this?

  • Model deployment
  • Feature engineering
  • Hyperparameter tuning
  • Model monitoring

Answer: B

Feature engineering is the process of transforming raw data (like a birth date string) into more informative ML inputs (like a numeric age column). This transformation makes the data more useful for model training.

Model Evaluation -- SageMaker Clarify

SageMaker Clarify provides three distinct capabilities: comparing foundation models, explaining model predictions, and detecting bias in datasets and models.

Capabilities

Model Evaluation

Description: Compare how two or more foundation models perform on a specific set of tasks using quantitative metrics.

How: Define evaluation tasks -> Clarify evaluates models on those tasks -> Returns scores per metric per model

Evaluation Types:

  • Human evaluation: Friendliness, humor, tone (requires human reviewers)
  • Automated metrics: Relevance, accuracy, brand voice alignment

Human Reviewers:

  • AWS-managed workforce
  • Your own employees

Datasets: Use built-in benchmark datasets or bring your own prompts and questions

Model Explainability

Description: Understand WHY a model made a specific prediction by identifying which input features had the most influence on the output.

A loan rejection model highlights that 'credit score', 'loan amount', and 'debt-to-income ratio' were the top three factors driving the rejection decision.

Benefits:

  • Debug incorrect predictions
  • Increase model transparency and stakeholder trust
  • Satisfy regulatory explainability requirements

Bias Detection

Description: Automatically detect and measure statistical bias in training datasets and trained models using built-in bias metrics.

How: Specify which input features to analyze -> Clarify automatically calculates bias metrics

Example Bias Types:

  • Class Imbalance: One demographic group is substantially over- or under-represented in the training data
  • Label Imbalance: Outcomes are disproportionately assigned to certain groups
  • Feature distribution mismatch: Different distributions for protected attributes

Output: Statistical bias scores per feature, highlighting which features contribute most to unfair model behavior

Key Terms

TermDefinition
SageMaker ClarifyA SageMaker tool with three capabilities: (1) compare foundation models on task performance, (2) explain which features drove a specific prediction, (3) detect statistical bias in datasets and models.
Model Explainability (Clarify)Clarify's ability to show which input features had the most influence on a specific model prediction -- enabling debugging, trust, and regulatory compliance.
Bias Detection (Clarify)Clarify's automated measurement of statistical bias in training data and models -- identifying whether certain groups are over/under-represented in a way that could lead to unfair outcomes.
Class ImbalanceA type of dataset bias where one group or outcome category is substantially more represented than another, causing the model to be better calibrated for the majority group.
Exam Tips:
  • Clarify has three roles: model comparison, model explainability, and bias detection.
  • 'Why did the model make this prediction?' -> Clarify model explainability (feature importance).
  • 'Detect bias or unfairness in model data' -> Clarify bias detection.
  • 'Compare model A vs model B on performance metrics' -> Clarify model evaluation.
  • Clarify is integrated directly into SageMaker Studio.

Practice Questions

Q1. A financial institution deployed a loan approval model and discovered it may be making decisions influenced by an applicant's zip code -- a potentially discriminatory proxy variable. Which SageMaker tool can automatically identify and measure this bias?

  • SageMaker Model Monitor -- to detect data drift in production
  • SageMaker Ground Truth -- to relabel training data with human feedback
  • SageMaker Clarify -- to automatically detect and measure bias across specified input features
  • SageMaker Data Wrangler -- to remove the zip code column from the dataset

Answer: C

SageMaker Clarify's bias detection capability automatically measures statistical bias in datasets and models. By specifying zip code as a feature of interest, Clarify will calculate bias metrics showing how much influence that feature has on outcomes -- surfacing potential discriminatory patterns.

Q2. What are the three main capabilities of SageMaker Clarify?

  • Data preparation, model training, and deployment
  • Model comparison, model explainability, and bias detection
  • Feature engineering, hyperparameter tuning, and inference
  • Data labeling, model versioning, and pipeline automation

Answer: B

SageMaker Clarify provides three distinct capabilities: (1) comparing foundation models on performance, (2) explaining which features drove a specific prediction, and (3) detecting statistical bias in datasets and models.

Q3. A team wants to understand WHY their loan rejection model rejected a specific applicant. Which SageMaker Clarify capability should they use?

  • Bias detection
  • Model evaluation
  • Model explainability (feature importance)
  • Data quality analysis

Answer: C

Model explainability in Clarify identifies which input features had the most influence on a specific prediction. It might show that credit score, loan amount, and debt-to-income ratio were the top three factors driving the rejection decision.

Q4. What is class imbalance in the context of ML bias?

  • When a model is too complex
  • When one demographic group is substantially over- or under-represented in training data
  • When the model makes too many predictions
  • When training takes too long

Answer: B

Class imbalance is a type of dataset bias where one group or outcome category is substantially more represented than another. This causes the model to be better calibrated for the majority group and potentially unfair to minority groups.

Q5. A company wants to compare how two foundation models perform on customer service tasks using both automated metrics and human evaluation. Which SageMaker tool enables this?

  • SageMaker Ground Truth
  • SageMaker Clarify model evaluation
  • SageMaker Model Monitor
  • SageMaker Autopilot

Answer: B

SageMaker Clarify's model evaluation capability allows you to compare how two or more foundation models perform on specific tasks using quantitative metrics. It supports both automated evaluation and human evaluation with AWS-managed or employee reviewers.

Human Feedback -- SageMaker Ground Truth

SageMaker Ground Truth is the primary SageMaker service for incorporating human judgment into ML workflows. It supports both data labeling (creating training datasets) and model alignment through human feedback.

RLHF -- Reinforcement Learning from Human Feedback. Humans evaluate model outputs and provide preference signals that train a reward model, aligning AI behavior with human expectations.

Use Cases

  • Data labeling: Assign labels to images, text, or audio to create supervised learning datasets
  • Model review: Have humans evaluate and grade model outputs for quality and correctness
  • Model alignment: Incorporate human preferences (e.g., business-appropriate tone) that automated training alone cannot capture
  • Customization: Fine-tune model behavior toward specific human-defined preferences

A set of animal photos is sent to reviewers. Each reviewer labels each photo as 'dog', 'cat', or 'ship'. These labeled images become the training dataset.

Workforce Options

  • Your own employees or internal team
  • Third-party contractors via Amazon Mechanical Turk
  • Pre-screened vendor workforce from AWS Marketplace

SageMaker Ground Truth Plus is the managed workforce feature within Ground Truth that orchestrates data labeling workflows at scale using the workforce options above.

Key Terms

TermDefinition
SageMaker Ground TruthA SageMaker service for building labeled training datasets and incorporating human feedback into ML workflows. Supports RLHF, data annotation, model grading, and model alignment.
RLHF (Reinforcement Learning from Human Feedback)A training technique where human evaluators rate or rank model outputs, and those preferences are used to train a reward model that guides the AI toward human-aligned behavior.
SageMaker Ground Truth PlusThe managed workforce capability within SageMaker Ground Truth that handles large-scale data labeling tasks using employees, MTurk workers, or vendor contractors.
Exam Tips:
  • RLHF at the exam -> SageMaker Ground Truth is a strong candidate.
  • Ground Truth = human feedback for labeling data AND aligning model behavior.
  • Ground Truth Plus = the managed workforce side of Ground Truth (labeling at scale).
  • Know the three workforce types: employees, Mechanical Turk, vendor marketplace.

Practice Questions

Q1. A company wants to fine-tune a customer-facing AI assistant to be more professional and business-appropriate in tone. Automated training alone hasn't achieved the desired behavior. Which approach using SageMaker is MOST appropriate?

  • SageMaker Automatic Model Tuning -- to search hyperparameter configurations for better tone
  • SageMaker Clarify -- to detect tone bias in model outputs
  • SageMaker Ground Truth with RLHF -- to have human reviewers provide preference feedback that aligns the model to a business-appropriate tone
  • SageMaker Data Wrangler -- to transform training text into more formal language

Answer: C

RLHF via SageMaker Ground Truth is the correct approach for aligning a model to human preferences -- in this case, a professional business tone. Human reviewers evaluate model responses, and their preference signals are used to reward and reinforce desired behavior during training.

Q2. What is RLHF?

  • Random Learning with High Frequency
  • Reinforcement Learning from Human Feedback
  • Real-time Low-latency High-throughput Framework
  • Regression Learning for Hyperparameter Fitting

Answer: B

RLHF stands for Reinforcement Learning from Human Feedback. It's a training technique where human evaluators rate or rank model outputs, and those preferences are used to train a reward model that guides the AI toward human-aligned behavior.

Q3. What are the three workforce options available in SageMaker Ground Truth?

  • Automated bots, AI reviewers, and machine learning models
  • Your own employees, Amazon Mechanical Turk workers, and vendor workforce from AWS Marketplace
  • Only internal employees
  • Only crowdsourced workers

Answer: B

SageMaker Ground Truth supports three workforce options: (1) your own employees or internal team, (2) third-party contractors via Amazon Mechanical Turk, and (3) pre-screened vendor workforce from AWS Marketplace.

Q4. A company needs to create labeled training data for a supervised learning model. Which SageMaker service is designed for data labeling?

  • SageMaker Clarify
  • SageMaker Ground Truth
  • SageMaker Data Wrangler
  • SageMaker Feature Store

Answer: B

SageMaker Ground Truth is the primary service for building labeled training datasets. It supports data labeling for images, text, audio, and other data types, using human reviewers from various workforce options.

Q5. What is Ground Truth Plus?

  • A premium pricing tier for Ground Truth
  • The managed workforce feature within Ground Truth for large-scale data labeling
  • A separate AWS service for data validation
  • An advanced model training feature

Answer: B

SageMaker Ground Truth Plus is the managed workforce capability within Ground Truth that orchestrates data labeling workflows at scale using the various workforce options (employees, MTurk, or vendor contractors).

ML Governance -- Cards, Dashboard, Monitor, Registry, Role Manager

SageMaker provides a suite of governance tools to document, track, monitor, version, and control access to ML models throughout their lifecycle.

Tools

Model Cards

Purpose: Create structured documentation for each ML model

Contents:

  • Intended uses and limitations
  • Risk rating
  • Training methodology and data sources
  • Evaluation results

Benefit: Enables auditability and informed decision-making before and after deployment

Model Dashboard

Purpose: Centralized view of all models across SageMaker

Capabilities:

  • Track which models are actively serving production traffic
  • View risk ratings, data quality, and model quality scores for all models
  • Surface models that violate defined thresholds for quality or bias
  • Accessible directly from the SageMaker console

Model Monitor

Purpose: Continuously or periodically monitor deployed model quality in production

Monitoring Targets:

  • Data quality drift -- input feature distributions have shifted
  • Model quality drift -- predictions are less accurate over time
  • Bias drift -- model has become more biased than the original baseline
  • Explainability drift -- feature importance has changed significantly

When monitored metrics exceed defined thresholds, an alert is triggered. The team then retrains or recalibrates the model.

A loan approval model starts approving applicants with low credit scores six months after deployment -> Model Monitor detects the quality drift and alerts the team.

Model Registry

Purpose: Centralized repository to track, version, and manage all ML models

Capabilities:

  • Store all model versions in one catalog with associated metadata
  • Implement an approval workflow: models must be approved before registration
  • Automate deployment pipelines by triggering actions on new model approvals
  • Share models across teams within the organization

Role Manager

Purpose: Define and enforce IAM permissions for SageMaker personas

Personas:

  • Data Scientist
  • MLOps Engineer
  • Data Engineer

Benefit: Ensures principle of least privilege -- each role only accesses what it needs within SageMaker

Key Terms

TermDefinition
SageMaker Model CardsStructured documentation templates for ML models capturing intended uses, risk ratings, training details, and evaluation results -- used for auditability and governance.
SageMaker Model DashboardA centralized portal in the SageMaker console showing all models, their deployment status, quality metrics, and threshold violations at a glance.
SageMaker Model MonitorA SageMaker service that continuously or periodically checks deployed model quality -- detecting data drift, model quality degradation, bias drift, and explainability drift.
Model DriftThe degradation of a deployed model's performance or fairness over time, typically caused by changes in real-world data patterns that diverge from the training distribution.
SageMaker Model RegistryA centralized catalog for versioning, tracking, and managing ML models with approval workflows that gate registration and automated deployment.
SageMaker Role ManagerA tool for defining and applying IAM permissions to different SageMaker user personas (data scientists, MLOps engineers, etc.) to enforce access control.
Exam Tips:
  • Model Cards = document a model (intended use, risk, training details).
  • Model Dashboard = view ALL models in one place with quality/risk metrics.
  • Model Monitor = real-time or scheduled alerts when deployed model quality degrades.
  • Model Registry = version and approval workflow for models before deployment.
  • Role Manager = access control and permissions for SageMaker users.
  • 'Model drifting in production' scenario -> Model Monitor.

Practice Questions

Q1. Six months after deployment, a fraud detection model begins missing obvious fraud cases. The team wants to automatically detect when model prediction accuracy drops below an acceptable threshold. Which SageMaker tool should they configure?

  • SageMaker Clarify -- to re-evaluate feature importance in the model
  • SageMaker Model Monitor -- to continuously monitor model quality and trigger alerts on threshold violations
  • SageMaker Model Registry -- to roll back to a previous model version
  • SageMaker Model Dashboard -- to manually compare model versions

Answer: B

SageMaker Model Monitor is the service that continuously or periodically evaluates deployed model quality and triggers alerts when metrics fall below defined thresholds. This allows the team to detect model drift early and take corrective action before it impacts business outcomes.

Q2. What is model drift?

  • When a model is deployed to a different region
  • The degradation of model performance over time due to changes in real-world data patterns
  • When a model is retrained with new data
  • When model parameters are manually adjusted

Answer: B

Model drift is the degradation of a deployed model's performance or fairness over time, typically caused by changes in real-world data patterns that diverge from the training distribution.

Q3. Which SageMaker tool provides structured documentation for ML models including intended uses, risk ratings, and training details?

  • SageMaker Model Registry
  • SageMaker Model Dashboard
  • SageMaker Model Cards
  • SageMaker Model Monitor

Answer: C

SageMaker Model Cards create structured documentation for each ML model, including intended uses and limitations, risk rating, training methodology and data sources, and evaluation results. This enables auditability and informed decision-making.

Q4. A company wants a centralized view of all their ML models showing deployment status, quality metrics, and threshold violations. Which SageMaker tool provides this?

  • SageMaker Model Cards
  • SageMaker Model Dashboard
  • SageMaker Model Registry
  • SageMaker Clarify

Answer: B

SageMaker Model Dashboard is a centralized portal in the SageMaker console showing all models, their deployment status, quality metrics, and threshold violations at a glance.

Q5. What types of drift can SageMaker Model Monitor detect?

  • Only data quality drift
  • Data quality, model quality, bias drift, and explainability drift
  • Only model accuracy changes
  • Only infrastructure changes

Answer: B

SageMaker Model Monitor can detect multiple types of drift: data quality drift (input feature distributions shifted), model quality drift (predictions less accurate), bias drift (model became more biased), and explainability drift (feature importance changed).

MLOps -- SageMaker Pipelines

SageMaker Pipelines provides CI/CD (Continuous Integration and Continuous Delivery) for machine learning. It automates the entire model lifecycle from data processing through deployment, enabling repeatable, auditable ML workflows.

Analogy: Just as software CI/CD pipelines automate code testing and deployment, SageMaker Pipelines automates ML model building, evaluation, and deployment.

Benefits

  • Iterate faster -- changes trigger automatic retraining and redeployment
  • Reduce human error -- no manual steps between pipeline stages
  • Ensure reproducibility -- every pipeline run is logged and auditable
  • Scale training -- automatically build and evaluate hundreds of model configurations

Pipeline Step Types

Processing

Description: Data processing and feature engineering

Training

Description: Model training with specified algorithm and data

Tuning

Description: Hyperparameter optimization (AMT)

AutoML

Description: Automatically train a model using SageMaker Autopilot

Model

Description: Create or register a SageMaker model (e.g., push to Model Registry)

ClarifyCheck

Description: Check for bias, model explainability, or drift versus a baseline

QualityCheck

Description: Validate data quality or model quality against a baseline

Typical Pipeline Order: Processing -> Training -> Tuning -> Model -> ClarifyCheck -> QualityCheck -> Deploy

Key Terms

TermDefinition
SageMaker PipelinesA managed CI/CD service for ML workflows that automates model building, training, evaluation, and deployment through configurable, auditable pipeline steps.
MLOpsMachine Learning Operations -- the practice of applying DevOps principles (automation, CI/CD, monitoring) to the ML lifecycle to improve speed, reliability, and reproducibility.
ClarifyCheck (Pipeline Step)A SageMaker Pipelines step that automatically runs SageMaker Clarify checks for bias, model explainability, or drift as part of an automated training pipeline.
Exam Tips:
  • SageMaker Pipelines = CI/CD for ML. Automates the full lifecycle from data prep to deployment.
  • MLOps = applying DevOps automation principles to machine learning.
  • Know the pipeline step types and their approximate order.
  • ClarifyCheck and QualityCheck steps integrate governance directly into the automated pipeline.

Practice Questions

Q1. An ML platform team wants to ensure that every time new training data is available, the model is automatically retrained, evaluated for bias, quality-checked, and deployed to production -- all without manual intervention. Which SageMaker feature enables this?

  • SageMaker Automatic Model Tuning -- to find the best model automatically
  • SageMaker Pipelines -- to define an automated end-to-end ML workflow with CI/CD
  • SageMaker Canvas -- to build models without code
  • SageMaker Ground Truth -- to incorporate human feedback in the retraining loop

Answer: B

SageMaker Pipelines provides CI/CD automation for ML. A pipeline can be configured to trigger on new data, run through processing, training, ClarifyCheck (bias), QualityCheck, and deployment steps automatically -- eliminating manual intervention from the MLOps workflow.

Q2. What is MLOps?

  • A machine learning algorithm
  • Applying DevOps principles to the ML lifecycle for automation, CI/CD, and monitoring
  • A SageMaker pricing tier
  • Manual model deployment practices

Answer: B

MLOps (Machine Learning Operations) is the practice of applying DevOps principles -- automation, CI/CD, monitoring -- to the ML lifecycle to improve speed, reliability, and reproducibility.

Q3. What is the ClarifyCheck step in a SageMaker Pipeline?

  • A step that checks data storage costs
  • A step that automatically runs SageMaker Clarify checks for bias, explainability, or drift
  • A step that validates user credentials
  • A step that compresses model artifacts

Answer: B

ClarifyCheck is a SageMaker Pipelines step that automatically runs SageMaker Clarify checks for bias, model explainability, or drift as part of an automated training pipeline.

Q4. What are the main benefits of using SageMaker Pipelines?

  • Faster iteration, reduced human error, reproducibility, and scaled training
  • Lower storage costs and faster downloads
  • Simpler data visualization
  • Manual control over each pipeline step

Answer: A

SageMaker Pipelines benefits include: faster iteration (changes trigger automatic retraining), reduced human error (no manual steps), reproducibility (every run is logged), and scaled training (automatically build hundreds of model configurations).

Q5. What is the typical order of steps in a SageMaker Pipeline?

  • Deploy -> Train -> Process
  • Processing -> Training -> Tuning -> Model -> ClarifyCheck -> QualityCheck -> Deploy
  • QualityCheck -> Deploy -> Training
  • There is no typical order

Answer: B

The typical order is: Processing (data prep) -> Training -> Tuning (hyperparameter optimization) -> Model (create/register) -> ClarifyCheck (bias/explainability) -> QualityCheck -> Deploy.

Accelerated Model Access -- JumpStart & Canvas

Jump Start

Overview: SageMaker JumpStart is a model hub that provides access to pre-trained foundation models and pre-built ML solutions, enabling teams to get started quickly without building from scratch.

Two Components

Model Hub

Description: Browse and deploy pre-trained models from leading providers

Model Sources:

  • Hugging Face
  • Meta (Llama)
  • Databricks
  • Stability AI
  • DeepSeek

Breadth: A significantly larger model catalog than Amazon Bedrock

Customization: Models can be fine-tuned with your own data before deployment

Deployment: Deployed directly on SageMaker -- you retain full control of deployment options

Ml Solutions

Description: Pre-built, end-to-end ML templates for common business scenarios

Examples:

  • Demand forecasting
  • Credit risk prediction
  • Fraud detection
  • Computer vision

Flow: Select solution -> Customize with your data -> Deploy

Workflow: Browse models/solutions -> Experiment -> Customize (fine-tune) -> Deploy

Canvas

Overview: SageMaker Canvas is a no-code visual interface for building and deploying ML models -- designed for business analysts and non-developers who want ML capabilities without writing code.

How It Works:

  • Upload a dataset and select the column to predict
  • Canvas walks through the model building process automatically
  • Powered by SageMaker Autopilot (AutoML) under the hood
  • Data transformations use Data Wrangler internally
  • Access pre-packaged models from Bedrock or JumpStart

Ready To Use Models

Sentiment Analysis

Powered By: Amazon Comprehend

Object Detection in Images

Powered By: Amazon Rekognition

Document Text Extraction

Powered By: Amazon Textract

Integration: Part of SageMaker Studio -- accessible directly from the Studio interface

Key Terms

TermDefinition
SageMaker JumpStartA model hub and solution catalog within SageMaker that provides pre-trained foundation models (from Hugging Face, Meta, etc.) and pre-built ML solutions for rapid deployment and customization.
SageMaker CanvasA no-code visual ML interface within SageMaker Studio that allows non-developers to build and deploy ML models by selecting data and prediction targets -- no programming required.
SageMaker AutopilotThe AutoML engine powering SageMaker Canvas that automatically selects algorithms, engineers features, trains models, and tunes hyperparameters -- all without manual configuration.
Exam Tips:
  • JumpStart = model hub with pre-trained models + pre-built solutions. More models than Bedrock.
  • Canvas = no-code ML interface. The exam signal is 'build ML models without writing code'.
  • Canvas uses Autopilot (AutoML) and Data Wrangler internally.
  • Canvas integrates directly with Rekognition, Comprehend, and Textract for ready-to-use AI tasks.
  • JumpStart models can be fully customized (fine-tuned) before deployment.

Practice Questions

Q1. A retail business analyst (non-developer) wants to build a demand forecasting model using last year's sales data -- without writing any code. Which SageMaker feature enables this?

  • SageMaker JumpStart -- to access a pre-built demand forecasting solution
  • SageMaker Canvas -- to build the forecasting model through a no-code visual interface
  • SageMaker Data Wrangler -- to prepare the sales data for training
  • SageMaker Automatic Model Tuning -- to automatically find the best forecasting algorithm

Answer: B

SageMaker Canvas is specifically designed for non-developers who want to build ML models without coding. The analyst can upload sales data, select the demand column to forecast, and Canvas handles the entire model building process automatically using Autopilot under the hood.

Q2. What is SageMaker JumpStart?

  • A data storage service
  • A model hub with pre-trained foundation models and pre-built ML solutions
  • A model monitoring service
  • A data labeling service

Answer: B

SageMaker JumpStart is a model hub that provides access to pre-trained foundation models from leading providers (Hugging Face, Meta, Stability AI) and pre-built ML solutions for common business scenarios like demand forecasting and fraud detection.

Q3. How does SageMaker Canvas build ML models internally?

  • It requires users to write Python code
  • It uses SageMaker Autopilot (AutoML) under the hood
  • It only supports pre-built models with no customization
  • It requires manual algorithm selection

Answer: B

SageMaker Canvas is powered by SageMaker Autopilot (AutoML) under the hood. When you select data and a prediction target, Canvas/Autopilot automatically selects algorithms, engineers features, trains models, and tunes hyperparameters.

Q4. What ready-to-use AI models does SageMaker Canvas provide access to?

  • Image generation only
  • Sentiment analysis (Comprehend), object detection (Rekognition), and document extraction (Textract)
  • Only custom-trained models
  • Translation services only

Answer: B

SageMaker Canvas integrates with AWS AI services to provide ready-to-use models including sentiment analysis (powered by Amazon Comprehend), object detection (powered by Amazon Rekognition), and document text extraction (powered by Amazon Textract).

Q5. Which has a larger model catalog -- SageMaker JumpStart or Amazon Bedrock?

  • Amazon Bedrock has significantly more models
  • SageMaker JumpStart has significantly more models
  • They have exactly the same models
  • Neither service provides pre-trained models

Answer: B

SageMaker JumpStart has a significantly larger model catalog than Amazon Bedrock, including models from Hugging Face, Meta (Llama), Databricks, Stability AI, DeepSeek, and many more providers.

Open Source Integration -- MLFlow on SageMaker

MLFlow is an open-source platform for managing the full ML lifecycle. SageMaker makes it easy to run MLFlow as a managed component within the AWS ecosystem.

What Mlflow Does

  • Track ML experiments -- log parameters, metrics, and results for each training run
  • Compare runs -- visually compare different experiments side by side
  • Manage model versions -- register and version trained models
  • Organize the ML workflow -- from experimentation to production

Sage Maker Integration

Feature
MLFlow Tracking Server
Description
SageMaker can launch a fully managed MLFlow Tracking Server in one click, allowing teams to use MLFlow's UI and APIs while AWS manages the underlying infrastructure.
Benefit
Teams who already use MLFlow can continue using familiar open-source tooling without managing servers.

Key Terms

TermDefinition
MLFlowAn open-source platform for ML experiment tracking, model versioning, and lifecycle management. SageMaker offers a managed MLFlow Tracking Server for teams who prefer open-source tooling.
MLFlow Tracking Server (SageMaker)A managed server launched within SageMaker that runs the MLFlow platform, allowing teams to track experiments and manage model versions without self-managing infrastructure.
Exam Tips:
  • MLFlow = open-source experiment tracking. SageMaker can run it as a managed Tracking Server.
  • You don't need deep MLFlow knowledge for the exam -- just know SageMaker supports it.
  • MLFlow is accessible from within SageMaker Studio.

Practice Questions

Q1. A team that already uses MLFlow for experiment tracking wants to continue using it on AWS without managing servers. What does SageMaker offer?

  • A migration tool to replace MLFlow with SageMaker-native tracking
  • A managed MLFlow Tracking Server that runs within SageMaker
  • MLFlow is not supported on AWS
  • A separate AWS account for MLFlow

Answer: B

SageMaker can launch a fully managed MLFlow Tracking Server in one click. This allows teams to use MLFlow's familiar UI and APIs while AWS manages the underlying infrastructure.

Q2. What does MLFlow do?

  • Manages AWS IAM permissions
  • Tracks ML experiments, compares runs, and manages model versions
  • Provides real-time inference endpoints
  • Labels training data using human reviewers

Answer: B

MLFlow is an open-source platform for managing the ML lifecycle. It tracks ML experiments (logging parameters, metrics, results), compares runs visually, manages model versions, and organizes the ML workflow from experimentation to production.

Q3. Where is MLFlow accessible from within SageMaker?

  • Only from the command line
  • From within SageMaker Studio
  • Only from a separate AWS console
  • MLFlow is not integrated with SageMaker

Answer: B

MLFlow is accessible from within SageMaker Studio, providing a seamless experience for teams who prefer open-source tooling alongside SageMaker's managed capabilities.

Q4. Why would a team choose to use MLFlow on SageMaker?

  • MLFlow is the only way to track experiments on AWS
  • They want to use familiar open-source tooling without managing infrastructure
  • MLFlow is required for SageMaker training jobs
  • MLFlow is cheaper than SageMaker

Answer: B

Teams who already use MLFlow can continue using familiar open-source tooling while AWS manages the underlying infrastructure. This provides continuity for existing workflows without the overhead of self-managing servers.

Q5. What is an MLFlow Tracking Server?

  • A model deployment endpoint
  • A managed server that runs the MLFlow platform for experiment tracking and model versioning
  • A data storage service
  • A hyperparameter tuning service

Answer: B

An MLFlow Tracking Server is a server that runs the MLFlow platform, allowing teams to track experiments and manage model versions. SageMaker offers this as a managed service, eliminating the need to self-manage infrastructure.

SageMaker Summary -- Quick Reference

Service Map

SageMaker Studio

Purpose: Unified web IDE for all SageMaker capabilities

Automatic Model Tuning (AMT)

Purpose: Automated hyperparameter optimization

Real-Time Inference

Purpose: Low-latency individual predictions (<=6 MB, <=60s)

Serverless Inference

Purpose: No-infrastructure real-time predictions (cold start risk)

Asynchronous Inference

Purpose: Near-real-time large payload processing (<=1 GB, <=1 hour)

Batch Transform

Purpose: Offline bulk predictions on entire datasets via S3

Data Wrangler

Purpose: No-code data preparation and feature engineering

Feature Store

Purpose: Centralized shared repository for ML features

Clarify

Purpose: Model comparison, explainability, and bias detection

Ground Truth

Purpose: Human-labeled data and RLHF model alignment

Model Cards

Purpose: Structured documentation for models

Model Dashboard

Purpose: Centralized view of all models and quality metrics

Model Monitor

Purpose: Production model quality drift alerts

Model Registry

Purpose: Version control and approval workflow for models

Pipelines

Purpose: CI/CD automation for end-to-end ML workflows (MLOps)

Role Manager

Purpose: IAM permissions for SageMaker personas

JumpStart

Purpose: Pre-trained model hub and pre-built ML solutions

Canvas

Purpose: No-code visual ML interface for non-developers

MLFlow on SageMaker

Purpose: Managed open-source experiment tracking server

Inference Decision Tree

Question: Which inference type?

Nodes

Need response immediately + small payload (<=6 MB)?

If Yes: -> Real-Time (if you want to manage scaling) or Serverless (if you want zero infra)

Large single payload (up to 1 GB) + can wait minutes?

If Yes: -> Asynchronous Inference

Processing an entire dataset with many records?

If Yes: -> Batch Transform

Exam Tips:
  • For any scenario: identify the keyword -> map to the SageMaker component.
  • Inference keyword map: 'no infrastructure' -> Serverless | 'large payload' -> Async | 'entire dataset' -> Batch | 'real-time now' -> Real-Time.
  • Governance keyword map: 'drift in production' -> Monitor | 'version control' -> Registry | 'document the model' -> Cards | 'approve before deploy' -> Registry | 'explain prediction' -> Clarify.
  • Data prep keyword map: 'transform data' -> Data Wrangler | 'share features' -> Feature Store.
  • Speed-to-deploy keyword map: 'no code' -> Canvas | 'pre-trained models' -> JumpStart.
  • Human feedback keyword map: 'label data' or 'RLHF' -> Ground Truth.

Practice Questions

Q1. Match each scenario to the correct SageMaker component: (1) A team discovers their deployed model has become 30% less accurate over 3 months. (2) A non-technical analyst wants to predict customer churn without writing code. (3) An organization wants to ensure all ML models have documented risk ratings. (4) A pipeline needs to automatically stop unpromising hyperparameter experiments.

  • (1) Model Monitor, (2) Canvas, (3) Model Cards, (4) AMT
  • (1) Clarify, (2) JumpStart, (3) Model Dashboard, (4) Pipelines
  • (1) Model Monitor, (2) JumpStart, (3) Model Registry, (4) Pipelines
  • (1) Ground Truth, (2) Canvas, (3) Model Dashboard, (4) AMT

Answer: A

(1) Model Monitor detects quality degradation in production models. (2) Canvas is the no-code ML interface for non-technical users. (3) Model Cards are the documentation templates for recording model risk, intended use, and training details. (4) Automatic Model Tuning (AMT) includes early stop conditions that terminate unpromising hyperparameter experiments to save compute cost.

Q2. Which inference type should you use for processing a single large file (up to 1 GB) that can take up to 1 hour to process?

  • Real-Time Inference
  • Serverless Inference
  • Asynchronous Inference
  • Batch Transform

Answer: C

Asynchronous Inference supports payloads up to 1 GB and processing times up to 1 hour. It's designed for large single-payload processing with near-real-time results via S3.

Q3. What is the exam keyword mapping for 'no infrastructure management' when choosing a SageMaker inference type?

  • Batch Transform
  • Real-Time Inference
  • Serverless Inference
  • Asynchronous Inference

Answer: C

'No infrastructure management' is the exam signal for Serverless Inference. It automatically scales without configuration and requires no server management.

Q4. Which SageMaker tool should you use when you need to explain which features drove a specific prediction?

  • SageMaker Model Monitor
  • SageMaker Clarify
  • SageMaker Model Registry
  • SageMaker Data Wrangler

Answer: B

SageMaker Clarify provides model explainability -- showing which input features had the most influence on a specific prediction. This is critical for debugging, building trust, and meeting regulatory requirements.

Q5. Which keyword mapping indicates you should use SageMaker Ground Truth?

  • 'Transform data' or 'prepare features'
  • 'Label data' or 'RLHF'
  • 'Monitor drift' or 'quality degradation'
  • 'Version control' or 'approval workflow'

Answer: B

The keywords 'label data' or 'RLHF' (Reinforcement Learning from Human Feedback) indicate SageMaker Ground Truth. It's the service for data labeling and incorporating human feedback into ML workflows.

AWS AI Practitioner - Table of Contents

Master all exam topics with comprehensive study guides and practice questions.


Popular Posts