AWS AI Practitioner - AWS Managed AI Services

Why AWS Managed AI Services?

Overview:

Before Amazon Bedrock existed, AWS built a portfolio of purpose-built, pre-trained AI services -- each solving a specific, narrow use case. These services are still heavily tested on the exam and widely used in production.

What Makes Managed AI Services Different from Bedrock:

Feature	Managed AI Services	Amazon Bedrock
Purpose	Narrow, specific use case (e.g., transcription only)	General-purpose GenAI -- text, images, code
Model choice	Fixed -- AWS manages the model	Choose from 30+ Foundation Models
Setup	Zero configuration -- API ready	Requires model selection and prompt design
Training	Pre-trained -- works out of the box (some support customization)	Fine-tunable FMs
Pricing	Pay per request (tokens, minutes, pages, API calls)	Pay per token (on-demand) or provisioned throughput

Why Use Managed Services?

Availability -- deployed across multiple AWS Regions, always on
Redundancy -- built across multiple Availability Zones (fault-tolerant)
Performance -- specialized CPUs/GPUs embedded in the service infrastructure
Cost efficiency -- pay only for what you use; no server over-provisioning needed
No ML expertise required -- AWS handles model training, updates, and infrastructure

Service Map -- What Does What:

Category	Service	Primary Function
Text/Document Processing	Amazon Comprehend	NLP -- entities, sentiment, PII, classification
Translation	Amazon Translate	Language translation
Speech-to-Text	Amazon Transcribe	Audio -> Text (ASR)
Text-to-Speech	Amazon Polly	Text -> Audio
Vision	Amazon Rekognition	Image/video analysis
Chatbots	Amazon Lex	Conversational AI chatbot builder
Recommendations	Amazon Personalize	Personalized product/content recommendations
Document Extraction	Amazon Textract	Extract text, forms, tables from documents
Document Search	Amazon Kendra	ML-powered enterprise document search
Human Tasks	Amazon Mechanical Turk	Crowdsourced human task workforce
Human Review	Amazon Augmented AI (A2I)	Human review of low-confidence ML predictions
Medical NLP	Amazon Comprehend Medical	NLP for medical text and PHI detection
Medical Speech	Amazon Transcribe Medical	Speech-to-text for medical terminology
Clinical Notes	AWS HealthScribe	Auto-generate clinical notes from patient conversations
AI Hardware	AWS Trainium / AWS Inferentia	Specialized ML training and inference chips on EC2

Key Terms

Term	Definition
AWS Managed AI Services	Pre-trained, purpose-built AI services from AWS designed to solve specific AI tasks (e.g., translation, transcription, vision) without requiring customers to train or manage ML models.
Pre-trained Model	A machine learning model already trained by AWS on large datasets. Customers use it via API without any model training -- works out of the box for its specific task.

Exam Tips:

Managed AI services = narrow, specific purpose. Bedrock = general GenAI. Know the distinction.
These services are pay-per-use (per API call, per minute of audio, per page) -- no server management needed.
The exam frequently presents a scenario and asks which service to use -- the service map above is your key reference.
Many services support CUSTOMIZATION on top of the pre-trained base (custom classifiers in Comprehend, custom vocabularies in Transcribe, custom labels in Rekognition).

Practice Questions

Q1. A startup wants to add sentiment analysis to their customer feedback application but has no ML expertise and needs to launch quickly. Which approach should they take?

Train a custom model on Amazon SageMaker
Use Amazon Comprehend, a pre-trained managed AI service
Build a neural network from scratch on EC2
Use Amazon Bedrock to create a custom foundation model

Answer: B

Amazon Comprehend is a managed AI service that provides sentiment analysis out-of-the-box with no ML expertise required. It's pre-trained and ready to use via API, perfect for quick deployment without custom model training.

Q2. What is the PRIMARY advantage of AWS Managed AI Services compared to training custom ML models?

They provide more customization options
They require no ML expertise and work out-of-the-box
They are always free to use
They support more programming languages

Answer: B

AWS Managed AI Services are pre-trained and API-ready, requiring no ML expertise to implement. They solve specific AI tasks (translation, transcription, Vision) without customers needing to train or manage ML models themselves.

Q3. A company needs to process images for object detection AND translate customer reviews into multiple languages. How many AWS Managed AI Services are needed?

One -- Amazon Bedrock handles both tasks
Two -- Amazon Rekognition for images and Amazon Translate for translation
One -- Amazon Comprehend handles all text and image tasks
Three -- one for each task plus a coordination service

Answer: B

Each AWS Managed AI Service has a specific, narrow purpose. Amazon Rekognition handles image analysis (object detection), while Amazon Translate handles language translation. These are complementary services, each solving one specific task.

Q4. Which pricing model applies to AWS Managed AI Services?

Fixed monthly subscription per service
Pay-per-use based on API calls, minutes processed, or pages analyzed
Upfront annual commitment required
Free tier only with no paid options

Answer: B

AWS Managed AI Services use pay-per-use pricing -- you pay based on consumption (per API call, per minute of audio transcribed, per page of documents analyzed). There's no server provisioning or capacity management required.

Q5. A financial services company needs to extract data from invoices AND detect toxic content in customer communications. Which services should they use?

Amazon Textract for document extraction and Amazon Comprehend for toxicity detection
Amazon Kendra for both tasks
Amazon Rekognition for both document and content analysis
Amazon SageMaker to build custom models for both tasks

Answer: A

Amazon Textract extracts structured data (forms, tables, key-value pairs) from documents like invoices. Amazon Comprehend provides NLP capabilities including sentiment analysis and content classification for detecting toxic or inappropriate text in communications.

Amazon Comprehend

What is Amazon Comprehend?

Amazon Comprehend is a fully managed, serverless Natural Language Processing (NLP) service that uses machine learning to discover insights, relationships, and meaning in unstructured text.

Core Capabilities (Out-of-the-Box):

Capability	What It Does	Example
Named Entity Recognition (NER)	Identify people, places, organizations, dates, quantities in text	'Zhang Wei' -> Person, 'July 31st' -> Date
Sentiment Analysis	Determine if text is positive, negative, neutral, or mixed	Customer review -> 'Negative, 85% confidence'
Key Phrase Extraction	Pull out the most important phrases from text	'minimum payment due' from a billing letter
PII Detection	Identify personally identifiable information in text	Names, phone numbers, credit card numbers, SSNs
Language Detection	Identify the language of the input text	'English, 99% confidence'
Targeted Sentiment	Sentiment about specific entities in text	How customers feel about a specific product feature
Syntax Analysis	Identify parts of speech (noun, verb, adjective)	Grammatical parsing of text

Supported Document Types:

Text, PDF, Word (.docx), images

Custom Capabilities (Requires Training Data):

1. Custom Classification:

Train Comprehend to categorize documents into YOUR own defined categories.

You provide labeled examples (minimum ~10 per class)
Store training data (CSV format) in Amazon S3
Comprehend trains a custom classifier internally
Available as: real-time, synchronous batch, or asynchronous analysis
Use case: Route incoming support emails to Billing, Technical Support, Complaint, or Feature Request categories

2. Custom Entity Recognition:

Train Comprehend to identify entities specific to YOUR business -- not just generic person/place/date.

Provide a list of target entities and documents containing them
Comprehend learns what your specific entity looks like in context
Use case: Automatically extract policy numbers, product model numbers, or customer escalation phrases from documents

Analysis Modes:

Real-time -- synchronous; immediate results for single documents
Asynchronous (Batch) -- submit large volumes of documents from S3; process offline

Key Exam Scenario:

Customer sends emails -> Comprehend classifies them as billing/support/complaints -> route to correct team automatically.

Key Terms

Term	Definition
Amazon Comprehend	A fully managed NLP service that extracts entities, key phrases, sentiment, PII, and language from unstructured text. Supports custom classifiers and custom entity recognizers.
Named Entity Recognition (NER)	Comprehend's ability to automatically identify and categorize specific entities in text -- people, organizations, locations, dates, quantities -- without any training required.
Sentiment Analysis	Comprehend's capability to determine whether text has a positive, negative, neutral, or mixed emotional tone. Used to analyze customer feedback and support interactions.
PII Detection (Comprehend)	Comprehend's ability to identify personally identifiable information -- names, credit card numbers, phone numbers, SSNs -- in text, enabling automated redaction or compliance workflows.
Custom Classification (Comprehend)	A Comprehend feature where users provide labeled training examples to teach Comprehend to categorize documents into custom business-specific categories (e.g., billing, support, complaints).
Custom Entity Recognition (Comprehend)	A Comprehend feature that allows training the model to recognize business-specific entities (e.g., policy numbers, product codes) by providing labeled examples of those entities in context.

Exam Tips:

Comprehend = NLP service. Use it for: sentiment, entities, PII, key phrases, language detection, document classification.
Custom Classification = YOU define the categories. Comprehend learns from your labeled examples.
Custom Entity Recognition = YOU define the entity types. Used for business-specific terms not in standard NER.
PII detection is a built-in, out-of-the-box Comprehend feature -- no training needed.
Comprehend Medical = separate service for medical text. Comprehend = general text.
Training data for custom models -> stored in Amazon S3 in CSV format.

Practice Questions

Q1. A company receives thousands of customer support emails daily and wants to automatically route them to the correct team (billing, technical support, or general complaints) without manual review. Which Comprehend feature enables this?

Named Entity Recognition -- to identify the customer's name in each email
Custom Classification -- to categorize emails into custom business-defined categories
Sentiment Analysis -- to route negative emails to the complaints team
PII Detection -- to identify customer accounts in the emails

Answer: B

Custom Classification in Amazon Comprehend allows you to define your own document categories (billing, technical support, complaints) and train Comprehend with labeled examples of each. Once trained, it automatically classifies incoming emails into the correct category for routing.

Q2. A legal firm wants to process thousands of contracts stored in S3 and automatically redact any personally identifiable information before sharing them externally. Which Amazon Comprehend capability is MOST relevant?

Custom Entity Recognition -- to identify PII as a custom entity type
Key Phrase Extraction -- to extract important legal terms
PII Detection -- built-in capability to identify names, SSNs, credit card numbers, and other PII in text
Custom Classification -- to classify contracts by PII risk level

Answer: C

Amazon Comprehend includes a built-in PII Detection capability that identifies personally identifiable information (names, phone numbers, SSNs, credit card numbers, etc.) in text without any custom training. Combined with batch processing from S3, this enables automated PII redaction workflows.

Q3. A news organization wants to automatically identify and tag all people, organizations, and locations mentioned in articles. Which Comprehend capability provides this out-of-the-box?

Custom Classification
Named Entity Recognition (NER)
Sentiment Analysis
Key Phrase Extraction

Answer: B

Named Entity Recognition (NER) is a built-in Comprehend capability that automatically identifies and categorizes entities like people, organizations, locations, dates, and quantities in text -- no training required.

Q4. A retail company wants to analyze customer product reviews to understand if customers feel positive, negative, or neutral about their purchases. Which Comprehend feature should they use?

Key Phrase Extraction
Language Detection
Sentiment Analysis
Custom Entity Recognition

Answer: C

Sentiment Analysis determines whether text has a positive, negative, neutral, or mixed emotional tone. This is perfect for analyzing customer reviews to understand overall customer satisfaction and feelings about products.

Q5. What format must training data be in when creating a Custom Classification model in Amazon Comprehend?

JSON stored in DynamoDB
CSV stored in Amazon S3
XML stored in Amazon RDS
Parquet stored in Redshift

Answer: B

Amazon Comprehend Custom Classification requires training data to be stored in CSV format in Amazon S3. The CSV contains labeled examples that Comprehend uses to learn your custom document categories.

Q6. A company receives documents in multiple languages and needs to detect which language each document is written in before processing. Which Comprehend capability handles this?

Custom Entity Recognition
Targeted Sentiment
Language Detection
Syntax Analysis

Answer: C

Language Detection is a built-in Comprehend capability that identifies the language of input text. This is useful for routing documents to appropriate processing pipelines based on their language.

Amazon Translate

What is Amazon Translate?

Amazon Translate is a fully managed neural machine translation service that provides accurate, natural-sounding language translation at scale.

Core Capabilities:

Translate text between a wide range of language pairs
Translate entire documents (plain text, HTML, .docx)
Batch translation of many files at once using S3 as input/output
Supports real-time API calls for individual translation requests

Advanced Features:

Custom Terminology:

Define how specific terms should be translated
Essential for brand names, product names, character names, or technical jargon that need consistent translation
Provided as a dictionary file in CSV, TSV, or TMX format
Example: Ensure 'EC2' is always translated as 'EC2' and not as a generic phrase

Parallel Data (Style Customization):

Customize the STYLE or FORMALITY of translation
Example: 'How are you?' in French for an informal context -> 'Comment ca va?' (casual). For a law office -> 'Comment allez-vous?' (formal)
Use parallel data to define input examples and their desired translated style

Pricing:

Pay per character translated -- billed based on the volume of text processed.

Use Cases:

Localizing a website or app for international users
Translating large batches of customer support tickets
Multilingual content generation
Real-time translation in a chat application

Exam Summary:

Amazon Translate = neural machine translation service. Key customization features: Custom Terminology (specific term translations) and Parallel Data (translation style/formality).

Key Terms

Term	Definition
Amazon Translate	A fully managed neural machine translation service for accurate, natural-sounding language translation at scale. Supports real-time, document, and batch translation.
Custom Terminology (Translate)	A dictionary that ensures specific terms (brand names, product names, acronyms) are translated consistently and correctly, overriding the default translation.
Parallel Data (Translate)	Example input-output translation pairs that customize the style or formality of Amazon Translate's output -- e.g., formal vs. informal language style.

Exam Tips:

Translate = language translation service. Simple and specific -- if the scenario mentions translating content to another language -> Translate.
Custom Terminology = controls HOW specific terms are translated (brand names, product names).
Parallel Data = controls the STYLE of translation (formal vs. informal register).
Batch translation uses S3 as input and output -- useful for translating large document collections.

Practice Questions

Q1. A global e-commerce company wants to translate its product catalog into 12 languages but needs to ensure its brand name 'NovaBike' is never translated into local equivalents -- it must always appear as 'NovaBike' in all languages. Which Translate feature enables this?

Parallel Data -- to provide formal translation style examples
Custom Terminology -- to define that 'NovaBike' should always remain untranslated
Batch Translation -- to process all catalog items simultaneously
Language Detection -- to identify the source language of each product description

Answer: B

Custom Terminology allows you to define specific terms that should be translated in a particular way -- or not translated at all. Brand names like 'NovaBike' that must appear consistently across all languages should be added to a Custom Terminology dictionary.

Q2. A law firm needs to translate client communications into French but requires a formal tone appropriate for legal correspondence. Which Amazon Translate feature enables style customization?

Custom Terminology
Parallel Data
Batch Translation
Real-time API

Answer: B

Parallel Data allows you to customize the style and formality of Amazon Translate's output. By providing example translations in your desired style (formal legal language), Translate learns to produce appropriately formal translations.

Q3. How is Amazon Translate priced?

Fixed monthly fee per language pair
Per character translated
Per document regardless of size
Free for all use cases

Answer: B

Amazon Translate is billed based on the number of characters translated. This pay-per-character model means you only pay for the actual volume of text processed.

Q4. A company needs to translate 10,000 product descriptions stored in S3 to multiple languages. What is the most efficient approach?

Call the real-time API 10,000 times sequentially
Use Amazon Translate batch translation with S3 as input and output
Use Amazon Comprehend to classify and translate
Build a custom translation model in SageMaker

Answer: B

Amazon Translate supports batch translation of many files at once using S3 as both input and output. This is the most efficient approach for processing large volumes of documents compared to sequential API calls.

Q5. What is the difference between Custom Terminology and Parallel Data in Amazon Translate?

Custom Terminology controls pronunciation; Parallel Data controls spelling
Custom Terminology controls specific term translations; Parallel Data controls translation style/formality
Both are identical features with different names
Custom Terminology is for batch; Parallel Data is for real-time

Answer: B

Custom Terminology defines HOW specific terms should be translated (or kept untranslated, like brand names). Parallel Data customizes the STYLE of translation (formal vs. informal register) by providing example input-output translation pairs.

Amazon Transcribe

What is Amazon Transcribe?

Amazon Transcribe is a fully managed Automatic Speech Recognition (ASR) service that converts audio speech into accurate text using deep learning.

Core Capabilities:

Convert spoken audio (microphone, phone call, media file) to text
Supports real-time streaming transcription
Supports batch transcription of audio files from S3
Automatic punctuation and formatting

Key Features:

PII Redaction:

Automatically detects and removes PII from transcription output
Redacts: names, phone numbers, SSNs, credit card numbers, dates of birth
Use case: Transcribing customer support calls while removing sensitive customer data

Automatic Language Identification:

Detect and handle multiple languages within the same audio stream
Example: Seamlessly transcribe a conversation that switches between English and French

Improving Transcription Accuracy:

Feature	What It Improves	How
Custom Vocabulary	Recognition of specific WORDS	Provide a list of domain-specific words, acronyms, brand names, and pronunciation hints
Custom Language Model	Understanding of CONTEXT	Train Transcribe on domain-specific text data so it understands industry terminology in context

Use BOTH together for highest accuracy.

Example: Without customization, 'AWS Microservices' might be transcribed as 'USA micro services'. With a Custom Vocabulary for 'AWS' and a Custom Language Model trained on IT text, it correctly transcribes 'AWS Microservices'.

Toxicity Detection:

Detects toxic speech in audio using BOTH speech cues (tone, pitch) AND text cues (profanity, hate speech)
Categories detected: sexual harassment, hate speech, threats, abuse, profanity, insults, graphic content
The combination of audio tone analysis + text analysis makes this more powerful than text-only detection

Use Cases:

Transcribing customer service calls for quality assurance
Automated closed captioning and subtitling
Creating searchable archives from recorded meetings
Generating metadata for media assets

Exam Tip:

Amazon Transcribe = speech-to-text (audio -> text). Amazon Polly = text-to-speech (text -> audio). These are OPPOSITES.

Key Terms

Term	Definition
Amazon Transcribe	A fully managed ASR (Automatic Speech Recognition) service that converts speech to text using deep learning. Supports real-time and batch transcription.
Automatic Speech Recognition (ASR)	The deep learning technology behind Amazon Transcribe that converts audio waveforms to text accurately and efficiently.
Custom Vocabulary (Transcribe)	A user-provided list of domain-specific words, acronyms, and brand names that improves recognition of specific TERMS in transcription.
Custom Language Model (Transcribe)	A model trained on domain-specific text data that improves Transcribe's understanding of CONTEXT -- how specific terms are used in a particular industry or domain.
Toxicity Detection (Transcribe)	A Transcribe feature that uses both audio cues (tone, pitch) and text cues (profanity, hate speech) to detect toxic speech across categories: harassment, threats, hate speech, abuse, etc.
PII Redaction (Transcribe)	Transcribe's ability to automatically detect and remove personally identifiable information from transcription output, enabling compliant handling of sensitive conversations.

Exam Tips:

Transcribe = SPEECH to TEXT. Polly = TEXT to SPEECH. Opposites -- memorize both directions.
Custom Vocabulary = specific WORDS. Custom Language Model = domain CONTEXT. Both together = maximum accuracy.
Toxicity Detection = uses BOTH audio (tone/pitch) + text (words) -- not just text analysis.
PII Redaction in Transcribe automatically removes names, phone numbers, SSNs from transcription output.
Auto Language Identification allows Transcribe to handle MULTILINGUAL audio in a single stream.
Use case: 'transcribe customer calls and remove PII' -> Amazon Transcribe with PII redaction enabled.

Practice Questions

Q1. A healthcare company transcribes patient calls using Amazon Transcribe but finds that medical drug names and procedure codes are frequently transcribed incorrectly. Which feature(s) should they enable?

PII Redaction -- to remove drug names from the transcription
Custom Vocabulary and Custom Language Model -- to improve recognition of medical terms and their context
Automatic Language Identification -- to detect which language each drug name is from
Toxicity Detection -- to flag inappropriate language in patient calls

Answer: B

Custom Vocabulary adds specific medical drug names and procedure codes to Transcribe's recognition dictionary. Custom Language Model trains Transcribe on medical domain text so it understands the context in which these terms appear. Together, they deliver the highest transcription accuracy for specialized medical terminology.

Q2. A customer service center wants to transcribe support calls while automatically removing customer names, phone numbers, and credit card numbers from the output. Which Transcribe feature enables this?

Custom Vocabulary
Automatic Language Identification
PII Redaction
Toxicity Detection

Answer: C

PII Redaction automatically detects and removes personally identifiable information (names, phone numbers, SSNs, credit card numbers) from transcription output, enabling compliant handling of sensitive customer conversations.

Q3. What is the relationship between Amazon Transcribe and Amazon Polly?

They are the same service with different names
They are opposites -- Transcribe converts speech to text; Polly converts text to speech
Transcribe is for video; Polly is for audio only
Both convert text to speech in different languages

Answer: B

Amazon Transcribe and Amazon Polly are opposite services. Transcribe uses ASR (Automatic Speech Recognition) to convert speech audio into text. Polly uses TTS (Text-to-Speech) to convert written text into spoken audio.

Q4. A moderation team wants to detect toxic speech in recorded audio, using both the words spoken AND the tone of voice. Which Transcribe feature provides this?

PII Redaction
Custom Language Model
Toxicity Detection
Custom Vocabulary

Answer: C

Toxicity Detection analyzes BOTH audio cues (tone, pitch) AND text cues (profanity, hate speech) to detect toxic speech. This dual-signal approach is more powerful than text-only toxicity detection.

Q5. A global company has call recordings where speakers switch between English and Spanish within the same conversation. How can Amazon Transcribe handle this?

It cannot -- separate recordings are needed for each language
Automatic Language Identification can detect and handle multiple languages in the same audio stream
Custom Vocabulary must include words from both languages
PII Redaction automatically translates between languages

Answer: B

Automatic Language Identification allows Amazon Transcribe to detect and handle multiple languages within the same audio stream, seamlessly transcribing conversations that switch between languages.

Amazon Polly

What is Amazon Polly?

Amazon Polly is a fully managed text-to-speech (TTS) service that converts written text into natural-sounding human speech using deep learning. It's the OPPOSITE of Amazon Transcribe.

Voice Engines (Newest to Oldest):

Engine	Quality	Best For
Generative	Most expressive, adaptive speech using GenAI	High-quality, natural conversational applications
Long-form	High quality for longer content	Audiobooks, long-form narration
Neural	Human-like, more natural than standard	Most production use cases
Standard	Basic TTS, original engine	Legacy/simple use cases

Advanced Features:

Lexicons:

Define how specific text strings should be PRONOUNCED (not translated)
Example: When Polly sees 'AWS', speak 'Amazon Web Services'
Example: When Polly sees 'W3C', speak 'World Wide Web Consortium'
Different from Translate's Custom Terminology -- Lexicons are about pronunciation, not translation

SSML (Speech Synthesis Markup Language):

XML-based markup language that gives fine-grained control over speech output
Controls: pauses, emphasis, whisper, pronunciation of abbreviations, speaking rate, pitch
Example: Hello how are you? -> produces a 1-second pause between 'Hello' and 'how are you'
Enables highly customized, natural-sounding outputs for specific contexts

Speech Marks:

Metadata that tells you WHERE in the audio a specific word or sentence starts and ends
Use cases: lip-syncing animated characters, highlighting words in a karaoke-style display as they are spoken

Use Cases:

Applications that talk (e.g., navigation systems, customer service bots)
Accessibility features for visually impaired users
Audiobook generation
E-learning content
IVR (Interactive Voice Response) systems

Quick Reference: Transcribe vs. Polly:

Service	Direction	Technology
Amazon Transcribe	Audio -> Text	ASR (speech recognition)
Amazon Polly	Text -> Audio	TTS (speech synthesis)

Key Terms

Term	Definition
Amazon Polly	A fully managed text-to-speech service that converts written text into natural-sounding speech. The opposite of Amazon Transcribe (which does speech-to-text).
SSML (Speech Synthesis Markup Language)	An XML-based markup language used with Polly to control how text is spoken -- adding pauses, emphasis, whispers, pronunciation hints, speaking rate adjustments, and more.
Lexicons (Polly)	Custom pronunciation dictionaries that tell Polly how to pronounce specific text strings -- e.g., expand 'AWS' to say 'Amazon Web Services'.
Speech Marks (Polly)	Metadata output from Polly that identifies where in the audio each word or sentence begins and ends. Used for lip-syncing animations or highlighting text as it is spoken.
Generative Voice Engine (Polly)	Polly's newest and most expressive voice engine, powered by generative AI, producing the most natural and adaptive speech quality.

Exam Tips:

Polly = TEXT to SPEECH. Transcribe = SPEECH to TEXT. These appear together as a trick question.
Lexicons = control PRONUNCIATION of specific strings (AWS -> 'Amazon Web Services').
SSML = fine-grained control of HOW text is spoken (pauses, emphasis, whisper, speaking rate).
Speech Marks = WHERE in audio each word starts/ends. Used for lip-sync and word highlighting.
Generative > Long-form > Neural > Standard -- newest to oldest, best to most basic quality.
Polly Lexicons vs. Translate Custom Terminology: Lexicons = pronunciation. Terminology = translation mapping.

Practice Questions

Q1. An e-learning platform uses Amazon Polly to read course content aloud. They need words to be highlighted in the UI as they are spoken, synchronized with the audio. Which Polly feature enables this?

SSML -- to add markup tags that control word timing
Lexicons -- to define how each word is pronounced
Speech Marks -- to get metadata about where each word starts and ends in the audio
Generative Engine -- to produce the most accurate timing for each word

Answer: C

Speech Marks is a Polly feature that provides metadata identifying exactly where in the audio each word or sentence begins and ends. This timing information enables the UI to synchronize word highlighting with the audio playback, creating a karaoke-style reading experience.

Q2. A company wants Amazon Polly to pronounce 'AWS' as 'Amazon Web Services' every time it appears in their text-to-speech application. Which feature enables this?

SSML markup
Lexicons
Speech Marks
Neural Engine

Answer: B

Lexicons define how specific text strings should be pronounced. By adding 'AWS' to a lexicon with the pronunciation 'Amazon Web Services', Polly will expand the acronym every time it encounters it.

Q3. A developer needs precise control over speech output including pauses, emphasis, whispers, and speaking rate. Which Polly feature provides this fine-grained control?

Lexicons
SSML (Speech Synthesis Markup Language)
Speech Marks
Custom Vocabulary

Answer: B

SSML is an XML-based markup language that gives fine-grained control over how text is spoken -- allowing pauses, emphasis, whispers, pronunciation hints, speaking rate adjustments, and more.

Q4. Which Amazon Polly voice engine provides the most natural, expressive speech using generative AI?

Standard Engine
Neural Engine
Long-form Engine
Generative Engine

Answer: D

The Generative Engine is Polly's newest and most expressive voice engine, powered by generative AI. It produces the most natural and adaptive speech quality, ideal for high-quality conversational applications.

Q5. What is the key difference between Polly Lexicons and Amazon Translate Custom Terminology?

Both control how words are translated between languages
Lexicons control pronunciation; Custom Terminology controls translation mapping
Both are the same feature in different services
Lexicons are for batch processing; Custom Terminology is for real-time

Answer: B

Polly Lexicons control how specific text strings are PRONOUNCED in speech output. Amazon Translate Custom Terminology controls how specific terms are TRANSLATED between languages. Different purposes for different services.

Amazon Rekognition

What is Amazon Rekognition?

Amazon Rekognition is a fully managed computer vision service that uses machine learning to analyze images and videos -- detecting objects, people, text, scenes, faces, and inappropriate content.

Core Capabilities:

Capability	What It Detects	Use Case
Label Detection	Objects, scenes, activities	'Person', 'Car', 'Skateboard', 'Outdoors' with confidence scores
Facial Analysis	Age range, gender, emotions, facial attributes	Emotion detection, attendance tracking
Face Comparison	How similar two faces are	Verifying if two photos are the same person
Face Search	Match a face against a database of known faces	Celebrity recognition, user verification
Text in Image (OCR)	Detect and extract text from images	Read signs, license plates, on-screen text
Content Moderation	Unsafe/inappropriate content	Filter adult, violent, or offensive content
Celebrity Recognition	Identify well-known public figures	Media archiving, content tagging
Face Liveness	Verify a face is a real, live person (not a photo)	Fraud prevention in identity verification
Personal Protective Equipment (PPE) Detection	Detect face masks, hard hats, safety vests	Workplace safety compliance
Pathing	Track movement of people or objects across frames	Sports analytics, retail foot traffic

Custom Labels:

Extend Rekognition to recognize YOUR OWN objects, logos, or products -- not just generic categories.

Provide labeled training images (only a few hundred needed)
Upload to Amazon S3 -> Rekognition trains a custom model
After training, new images are analyzed for YOUR specific objects/logos
Use case: The NFL detects its logos in social media photos

Content Moderation in Detail:

Automatically flags inappropriate, unsafe, or offensive content in images/videos
Reduces human review burden to ~1-5% of total content volume (only ambiguous cases go to humans)
Human review integration: use Amazon Augmented AI (A2I) for final decisions on flagged content
Custom Moderation Adapter: extend default moderation by training Rekognition with your own labeled examples for domain-specific moderation rules
Human-reviewed results can feed BACK into Rekognition training to continuously improve accuracy

Content Moderation API Flow (Exam Scenario):

User request -> Chatbot generates image -> 
Send to Rekognition DetectModerationLabels API ->
If SAFE -> return image to user
If UNSAFE -> block/flag image

Supported Input Formats:

Images: JPEG, PNG

Videos: via S3 bucket (for batch) or Kinesis Video Streams (for real-time)

Key Terms

Term	Definition
Amazon Rekognition	A fully managed computer vision service for analyzing images and videos. Detects objects, faces, text, scenes, unsafe content, celebrities, and more using pre-trained ML models.
Custom Labels (Rekognition)	A feature allowing users to train Rekognition to recognize custom objects, logos, or products by providing labeled training images. Used when default labels don't cover business-specific needs.
Content Moderation (Rekognition)	Rekognition's ability to automatically detect inappropriate, offensive, or unsafe content in images and videos. Reduces human review volume to 1-5% of content.
Face Liveness (Rekognition)	A Rekognition feature that verifies whether the face in front of a camera is a real, live person -- not a photo or spoofed identity. Used in identity verification workflows.
DetectModerationLabels API	The Amazon Rekognition API that analyzes an image and returns labels indicating whether the content is safe or contains inappropriate material -- enabling automated content moderation in applications.
Amazon Augmented AI (A2I)	An AWS service that routes low-confidence ML predictions (from Rekognition, Textract, or custom models) to human reviewers for manual validation. Integrates with Mechanical Turk or private workforce.

Exam Tips:

Rekognition = images and videos. Know ALL the capabilities: labels, faces, text, moderation, celebrities, liveness, PPE.
Custom Labels = train Rekognition on YOUR OWN objects/logos with just a few hundred images.
Content Moderation -> Rekognition reduces human review to 1-5%. A2I handles the remaining ambiguous cases.
Face Liveness = verify a REAL person (not a spoofed photo). Used in identity verification.
DetectModerationLabels API = the specific API for content moderation in Rekognition.
Rekognition text-in-image is DIFFERENT from Textract -- Rekognition is for detecting text in images; Textract is for structured document extraction.

Practice Questions

Q1. A social media company wants to automatically detect and remove explicit content from user-uploaded images before they are published. They need to minimize human review costs while maintaining safety standards. Which combination of services is BEST?

Amazon Comprehend for sentiment analysis + Amazon Translate for content classification
Amazon Rekognition Content Moderation to auto-flag content + Amazon A2I for human review of ambiguous cases
Amazon Textract to extract image text + Amazon Comprehend to classify the text as inappropriate
Amazon Lex with content filtering rules + Amazon Polly for content narration

Answer: B

Amazon Rekognition Content Moderation automatically detects inappropriate content, handling ~95-99% of cases automatically and reducing human review to just 1-5% of ambiguous cases. Amazon Augmented AI (A2I) then routes those ambiguous cases to human reviewers for final decisions.

Q2. A company wants Amazon Rekognition to identify when its company logo appears in social media photos. The default label detection does not recognize the logo. What should they do?

Use Rekognition's Celebrity Recognition feature -- logos are treated like public figures
Use Amazon Comprehend custom entities to define the logo as a text-based entity
Use Rekognition Custom Labels -- train the model with labeled images containing the logo
Use Amazon Textract to extract the logo text from photos

Answer: C

Rekognition Custom Labels allows companies to train Rekognition on their own specific objects, logos, and products by providing labeled training images. After training on images containing the logo, Rekognition can reliably detect that specific logo in new images.

Q3. A banking app needs to verify that the person applying for an account is a real live person and not a photo of someone else. Which Rekognition feature addresses this?

Face Comparison
Face Liveness
Content Moderation
Celebrity Recognition

Answer: B

Face Liveness verifies that the face in front of a camera is a real, live person -- not a photo or spoofed identity. This prevents fraud in identity verification workflows by detecting presentation attacks.

Q4. A construction company needs to verify that workers are wearing required safety equipment (hard hats, vests) in photos from job sites. Which Rekognition capability handles this?

Label Detection
Content Moderation
Personal Protective Equipment (PPE) Detection
Custom Labels

Answer: C

PPE Detection is a built-in Rekognition capability that detects personal protective equipment including face masks, hard hats, and safety vests in images, enabling automated workplace safety compliance monitoring.

Q5. What percentage of content does Amazon Rekognition Content Moderation typically process automatically, leaving only ambiguous cases for human review?

50%
75%
95-99%
100%

Answer: C

Rekognition Content Moderation automatically handles approximately 95-99% of content, reducing human review burden to just 1-5% of ambiguous cases. This dramatically reduces moderation costs while maintaining safety standards.

Q6. An image in a photo album contains text showing a street address. Which AWS service can detect and extract this text from the image?

Amazon Comprehend
Amazon Rekognition Text in Image
Amazon Translate
Amazon Polly

Answer: B

Rekognition Text in Image (OCR) detects and extracts text visible in photos and real-world scenes, such as signs, labels, license plates, and on-screen text. This is different from Textract, which extracts structured data from documents.

Amazon Lex

What is Amazon Lex?

Amazon Lex is a fully managed service for building conversational AI chatbots that can interact with users via voice or text. It is the same technology that powers Amazon Alexa.

Core Concepts:

Concept	Definition	Example
Intent	What the user wants to accomplish	'Book a hotel', 'Order pizza', 'Check account balance'
Utterance	A phrase that triggers an intent	'I want to book a hotel' or 'Reserve a room' -> triggers BookHotel intent
Slot	An input parameter needed to fulfill the intent	City, check-in date, number of nights, room type for a hotel booking
Fulfillment	The action taken when all slots are filled	Invoke an AWS Lambda function to execute the booking

How a Lex Bot Works:

User says/types an utterance (e.g., 'Book a hotel for 3 nights in Paris')
Lex identifies the intent (BookHotel)
Lex checks which slots are filled (Paris = city [check], 3 nights [check]) and which are missing (check-in date [x])
Lex asks for missing slots ('What day do you want to check in?')
Once all slots are filled, Lex calls the configured AWS Lambda function
Lambda performs the actual booking in the backend system
Lex returns the confirmation message to the user

Two Creation Methods:

Traditional Bot -- manually configure intents, utterances, slots, and fulfillment
Generative AI Bot -- describe the bot in natural language; Bedrock generates the configuration automatically

Key Integrations:

AWS Lambda -- fulfillment actions (execute bookings, database lookups)
Amazon Connect -- deploy Lex bots in call center phone systems
Amazon Comprehend -- add NLP understanding to classify customer intent
Amazon Kendra -- connect to a document knowledge base for Q&A

Use Cases:

Customer service chatbots
Booking and reservation systems (hotels, flights, restaurants)
IT help desk bots
FAQ bots for websites
Phone-based IVR (Interactive Voice Response) systems

Supports: Multiple languages

Exam Tip:

Amazon Lex = chatbot/conversational AI builder. When the scenario asks about building a chatbot with intents and slot collection -> Lex.

Key Terms

Term	Definition
Amazon Lex	A fully managed service for building conversational AI chatbots using voice or text. Uses the same technology as Amazon Alexa.
Intent (Lex)	The goal or action a user wants to accomplish in a Lex bot -- e.g., 'Book a hotel', 'Check balance'. Each bot can have multiple intents.
Utterance (Lex)	A sample phrase that triggers a specific intent. Lex uses these examples to learn which phrases map to which intents.
Slot (Lex)	An input parameter that a Lex bot collects from the user to fulfill an intent. For hotel booking: city, check-in date, number of nights, room type are all slots.
Fulfillment (Lex)	The action executed by Lex once all required slots are collected -- typically invoking an AWS Lambda function to perform the backend operation.

Exam Tips:

Lex = chatbot builder. Intent = what user wants. Utterance = how they say it. Slot = parameters needed. Lambda = fulfills the action.
Lex bots collect SLOTS (parameters) through conversation -- asking follow-up questions until all required info is gathered.
Lex -> Lambda = the standard fulfillment pattern. Lambda performs the actual business logic.
Amazon Connect integration = Lex-powered phone bots for call centers.
Generative AI bot creation in Lex uses Amazon Bedrock to auto-generate bot configuration from a natural language description.

Practice Questions

Q1. A hotel chain wants to build a chatbot that allows guests to make room reservations by providing their preferred city, check-in date, number of nights, and room type via text or voice. Which service and concepts are MOST relevant?

Amazon Kendra with intent recognition and document search
Amazon Lex with intents, slots, and Lambda fulfillment
Amazon Comprehend with custom classification for reservation requests
Amazon Personalize with a hotel booking recipe

Answer: B

Amazon Lex is designed exactly for this use case. The booking intent is defined with slots (city, check-in date, nights, room type). Lex converses with the user to collect all slot values, then invokes a Lambda function to execute the reservation in the hotel's booking system.

Q2. In Amazon Lex, what is a 'slot'?

A time period when the bot is available
An input parameter that the bot collects from the user to fulfill an intent
A backup response when the bot doesn't understand
A connection to a database

Answer: B

In Lex, a slot is an input parameter needed to fulfill an intent. For example, a hotel booking intent needs slots for city, check-in date, number of nights, and room type. Lex asks follow-up questions until all required slots are filled.

Q3. A company wants to deploy their Amazon Lex chatbot to their phone-based customer service system. Which AWS service integrates with Lex for this purpose?

Amazon Polly
Amazon Connect
Amazon Transcribe
Amazon SNS

Answer: B

Amazon Connect is AWS's cloud contact center service that integrates with Amazon Lex to deploy chatbots in phone-based IVR (Interactive Voice Response) systems, enabling voice-based conversational AI for customer service.

Q4. What happens in Amazon Lex after all required slots for an intent have been collected from the user?

The conversation ends automatically
Lex invokes the configured AWS Lambda function for fulfillment
Lex sends an email notification
The user must manually confirm completion

Answer: B

Once all required slots are filled, Lex triggers fulfillment by invoking the configured AWS Lambda function. Lambda performs the actual business logic (database lookup, booking, API call) and returns a confirmation message to the user.

Q5. Amazon Lex now offers a way to create bots using natural language descriptions. What technology powers this capability?

Amazon Comprehend
Amazon SageMaker
Amazon Bedrock (Generative AI)
Amazon Translate

Answer: C

Generative AI bot creation in Amazon Lex uses Amazon Bedrock to automatically generate bot configuration (intents, utterances, slots) from a natural language description, simplifying bot development.

Amazon Personalize

What is Amazon Personalize?

Amazon Personalize is a fully managed machine learning service that enables developers to build applications with real-time, personalized recommendations -- the same technology powering Amazon.com product recommendations.

Key Positioning:

No need to build, train, or deploy ML models from scratch
Takes days, not months, to implement
Integrates with S3 for batch data ingestion and real-time API for live data
Exposes a customized recommendation API for your web/mobile apps

How It Works:

Provide user interaction data, item catalog, and user metadata via S3 or real-time API
Choose a Recipe (pre-built algorithm for a specific recommendation use case)
Personalize trains and hosts the recommendation model
Call the Personalize API from your application to get personalized recommendations in real-time

Recipes -- Pre-Built Algorithms:

Recipes are Personalize's name for pre-implemented ML algorithms, each designed for a specific recommendation scenario:

Recipe Category	Recipe Name	Use Case
USER_PERSONALIZATION	User-Personalization-v2	Recommend items for each specific user based on their history
PERSONALIZED_RANKING	Personalized-Ranking-v2	Re-rank a list of items in order of relevance for a specific user
RELATED_ITEMS	Similar-Items	Recommend items similar to one the user is viewing ('customers also bought')
TRENDING	Trending-Now	Recommend currently trending or popular items
POPULARITY	Popularity-Count	Recommend the most popular items overall
NEXT_BEST_ACTION	Next-Best-Action	Recommend the next action or content a user should engage with
USER_SEGMENTATION	Item-Affinity	Group users into segments based on item preferences

Critical Exam Rule: Recipes in Personalize are ALWAYS about RECOMMENDATIONS -- not forecasting, not classification, not any other task.

Delivery Channels:

Website and mobile app APIs (real-time recommendations)
Email and SMS marketing campaigns (batch personalization)

Use Cases:

E-commerce: 'Customers who bought X also bought Y'
Streaming: 'Recommended shows for you'
News: 'Articles you might like'
Retail: Personalized promotions and email campaigns

Key Terms

Term	Definition
Amazon Personalize	A fully managed ML service for building real-time personalized recommendation systems -- the same technology behind Amazon.com's product recommendations.
Recipe (Personalize)	A pre-built ML algorithm in Amazon Personalize for a specific recommendation scenario. All recipes produce personalized recommendations -- not forecasts or classifications.
User-Personalization (Recipe)	A Personalize recipe that recommends items for each user based on their individual interaction history and preferences.
Personalized Ranking (Recipe)	A Personalize recipe that re-orders a provided list of items based on the relevance to a specific user's preferences.
Similar Items (Recipe)	A Personalize recipe that recommends items related to one the user is currently viewing -- 'customers who viewed this also viewed'.

Exam Tips:

Personalize = RECOMMENDATIONS ONLY. Not forecasting, not classification -- personalized recommendations.
Recipes = pre-built algorithms. All Personalize recipes produce some form of recommendation.
The exam may ask 'what service provides personalized product recommendations?' -> Amazon Personalize.
Personalize is the SAME technology as Amazon.com's own recommendation engine -- key selling point.
Personalize supports REAL-TIME API recommendations AND batch delivery via email/SMS campaigns.

Practice Questions

Q1. A streaming service wants to show each user a personalized list of movies ranked in order of likelihood they'll enjoy them, based on their viewing history. Which Amazon Personalize recipe is MOST appropriate?

User-Personalization-v2 -- to recommend movies to each user from the full catalog
Personalized-Ranking-v2 -- to re-rank a curated list of movies in order of relevance for each user
Similar-Items -- to recommend movies similar to what the user just watched
Trending-Now -- to recommend currently trending movies to all users

Answer: B

Personalized-Ranking-v2 re-orders a provided list of items in descending order of relevance for a specific user. When you have a curated list (e.g., this week's new releases) and want to personalize the order for each viewer, Personalized Ranking is the correct recipe.

Q2. An e-commerce site wants to show 'Customers who bought this also bought...' recommendations on product pages. Which Personalize recipe is MOST appropriate?

User-Personalization-v2
Personalized-Ranking-v2
Similar-Items
Trending-Now

Answer: C

Similar-Items recommends items related to one the user is currently viewing, perfect for 'customers who bought this also bought' or 'related products' scenarios on product detail pages.

Q3. What type of output does Amazon Personalize ALWAYS produce, regardless of which recipe is used?

Forecasts
Classifications
Recommendations
Translations

Answer: C

Amazon Personalize recipes ALWAYS produce personalized recommendations. It is not a forecasting service (that's Amazon Forecast) or a classification service (that's Comprehend). All Personalize recipes are recommendation-focused.

Q4. Amazon Personalize uses the same recommendation technology as which famous platform?

Netflix
Amazon.com
Spotify
YouTube

Answer: B

Amazon Personalize is built on the same ML technology that powers Amazon.com's product recommendations. This is a key selling point -- the same technology behind one of the world's most successful recommendation engines is available as a managed service.

Q5. A news website wants to recommend currently trending articles to all users (not personalized per user). Which Personalize recipe should they use?

User-Personalization-v2
Similar-Items
Trending-Now
Next-Best-Action

Answer: C

Trending-Now recommends currently trending or popular items across all users. This is ideal for highlighting what's popular right now, rather than personalized recommendations based on individual user history.

Amazon Textract

What is Amazon Textract?

Amazon Textract is a fully managed ML service that automatically extracts text, handwriting, forms, tables, and structured data from scanned documents and images -- going far beyond simple OCR.

What Makes Textract Different from Basic OCR:

Basic OCR reads raw characters. Textract understands STRUCTURE -- it knows the difference between a title, a table cell, a form field, and a value.

Extraction Capabilities:

Capability	What It Extracts	Example
Raw Text	All text content from the document	Every word on a paystub
Layout	Document structure (title, section header, paragraph)	'Earning Statements' identified as a title
Forms (Key-Value Pairs)	Field name + its associated value	'Period Ending' -> '7/18/2008', 'SSN' -> '--***'
Tables	Rows, columns, and cell values	Full earnings table with rates, hours, and totals
Queries	Natural language questions about document content	'What is the year-to-date gross pay?' -> '$45,200'
Expense Analysis	Vendor info, line items, totals from receipts/invoices	Vendor: 'ABC Corp', Item: 'Laptop', Price: '$1,200'
ID Analysis	Standardized field extraction from government IDs	First name, last name, DOB, address, document number

Supported Input Formats:

Images (JPEG, PNG), PDFs

Analysis Types:

Synchronous (Real-time) -- for single pages/images
Asynchronous (Batch) -- for multi-page PDFs via S3

Use Cases by Industry:

Financial Services -- invoice processing, financial report extraction, loan document analysis
Healthcare -- medical records, insurance claims, prescription extraction
Public Sector -- tax forms, identity documents (passports, IDs), government applications

Textract vs. Rekognition Text-in-Image:

Feature	Amazon Textract	Rekognition Text in Image
Purpose	Structured document extraction	Read text visible in photos/scenes
Context	Forms, tables, key-value pairs from documents	Signs, labels, numbers in photos
Best for	Scanned business documents	Text embedded in real-world images

Key Terms

Term	Definition
Amazon Textract	A fully managed ML service that extracts text, handwriting, forms, tables, and structured data from scanned documents and images -- far beyond basic character recognition.
Key-Value Pair Extraction (Textract)	Textract's ability to identify form fields and their corresponding values -- e.g., 'Pay Date' -> '2024-01-15', enabling structured data extraction from forms.
Table Extraction (Textract)	Textract's ability to detect and extract complete tables from documents, preserving the row/column structure of the data.
Queries (Textract)	A Textract feature that allows natural language questions about document content -- e.g., 'What is the total amount due?' -- and returns the specific extracted answer.

Exam Tips:

Textract = extract STRUCTURED data from documents (forms, tables, key-value pairs). Not just raw text.
Textract vs. Rekognition: Textract = structured document extraction. Rekognition text = reading text in photos of real-world scenes.
Textract supports: text, handwriting, forms, tables, queries, expense analysis, and ID document analysis.
Use case: 'extract data from scanned invoices/medical records/IDs' -> Amazon Textract.
Textract Queries = ask natural language questions about document content and get extracted answers.

Practice Questions

Q1. A bank processes thousands of loan application forms daily. Each form has standardized fields (applicant name, income, loan amount requested). They want to automatically extract all field-value pairs from scanned PDFs. Which service is MOST appropriate?

Amazon Comprehend -- to extract entities like names and amounts from the text
Amazon Rekognition -- to detect text in scanned form images
Amazon Textract -- to extract key-value pairs from structured form documents
Amazon Kendra -- to index and search loan application documents

Answer: C

Amazon Textract is specifically designed to extract key-value pairs from structured forms -- identifying both the field label ('Applicant Name') and its corresponding value ('John Smith'). This is more powerful than Comprehend's NER (which identifies entities in free text) or Rekognition's text detection (which reads raw characters in images).

Q2. What makes Amazon Textract different from basic OCR (Optical Character Recognition)?

Textract only works with PDFs
Textract understands document STRUCTURE -- forms, tables, key-value pairs -- not just raw characters
Textract requires training on each document type
Textract only extracts handwritten text

Answer: B

While basic OCR reads raw characters, Textract understands document STRUCTURE. It knows the difference between titles, table cells, form fields, and values -- enabling structured data extraction rather than just text reading.

Q3. A company wants to ask natural language questions about document content, like 'What is the total amount due?' and get direct answers. Which Textract feature enables this?

Table Extraction
Key-Value Pair Extraction
Queries
Layout Detection

Answer: C

Textract Queries allow you to ask natural language questions about document content and receive specific extracted answers. Instead of extracting all fields, you can ask targeted questions like 'What is the total amount due?' and get just that value.

Q4. An HR department needs to extract data from employee ID cards including name, date of birth, and document number. Which Textract capability handles this?

Forms Extraction
Table Extraction
ID Analysis
Expense Analysis

Answer: C

ID Analysis provides standardized field extraction from government-issued ID documents, extracting fields like first name, last name, date of birth, address, and document number in a structured format.

Q5. What is the difference between Amazon Textract and Amazon Rekognition Text-in-Image?

They are the same feature
Textract extracts structured data from documents; Rekognition reads text in real-world photos/scenes
Textract is for audio; Rekognition is for images
Rekognition extracts tables; Textract detects objects

Answer: B

Textract is designed for structured document extraction (forms, tables, invoices). Rekognition Text-in-Image reads text visible in real-world photos and scenes (signs, labels, license plates). Different tools for different contexts.

Amazon Kendra, Mechanical Turk, and Augmented AI (A2I)

Amazon Kendra -- Intelligent Document Search:

Amazon Kendra is a fully managed, ML-powered enterprise document search service. Unlike keyword search, Kendra understands the MEANING of questions and returns direct answers.

How It Works:

Index documents from multiple sources: S3, SharePoint, OneDrive, Confluence, RDS, databases, FAQs, PDFs, HTML, Word, PowerPoint
Build an internal ML-powered knowledge index
Users ask natural language questions; Kendra returns direct answers -- not just a list of links

Key Features:

Natural Language Queries: 'Where is the IT support desk?' -> Kendra returns 'First floor' (extracted from an HR document)
Incremental Learning: Kendra learns from user feedback and clicks to improve future search results
Fine-tuning: Customize relevance based on data freshness, document importance, or custom metadata filters

Kendra vs. Q Business:

Both provide document-based Q&A, but Q Business is an enterprise assistant with authentication, plugins, and admin controls. Kendra is specifically a search service that can be embedded in any application.

Exam Tip: 'ML-powered document search service' -> Amazon Kendra

---

Amazon Mechanical Turk -- Human Crowdsourcing:

Amazon Mechanical Turk (MTurk) is a crowdsourcing marketplace that provides access to a distributed global workforce to perform simple, scalable human tasks.

Named After: The 1770 chess-playing 'Mechanical Turk' -- an illusion of a machine playing chess, actually operated by a hidden human.

How It Works:

Requesters publish small tasks (HITs -- Human Intelligence Tasks) with a reward per task
Workers worldwide accept and complete tasks for the reward
Results are aggregated and returned to the requester

Common Task Types:

Image labeling / classification for ML training data
Data collection and validation
Sentiment annotation
Content moderation review
Business process tasks (data entry, form filling)

Why It Matters for AI/ML:

The primary use of MTurk in AI workflows is DATA LABELING -- creating the labeled training datasets that supervised ML algorithms require. Labeling millions of images by hand is impractical for one team; MTurk distributes this across thousands of workers.

Integrations:

Amazon A2I -- for human review of ML predictions
Amazon SageMaker Ground Truth -- for labeling workflows

---

Amazon Augmented AI (A2I) -- Human Review of ML Predictions:

Amazon A2I enables human oversight of machine learning predictions when confidence is low or when random auditing is required.

How It Works:

ML model makes a prediction (from Rekognition, Textract, or a custom model)
A2I evaluates confidence:
High confidence -> prediction returned directly to the application
Low confidence -> routed to human reviewers
Humans review and correct the prediction
Results are stored in S3
Approved predictions can be fed back into the ML model to improve it over time

Supported Task Types:

Image moderation (Rekognition)
Document key-value extraction (Textract)
Custom ML model predictions

Who Reviews? Three workforce options:

Private Workforce -- your own employees (for confidential/sensitive data)
Amazon Mechanical Turk -- 500,000+ independent contractors (for general tasks)
Vendor Workforce -- pre-screened third-party vendors from AWS Marketplace (for specialized/confidential tasks)

Exam Pattern:

'Human review of ML predictions when confidence is low' -> Amazon Augmented AI (A2I)

'Crowdsourced human labeling of data' -> Amazon Mechanical Turk

'ML-powered document search' -> Amazon Kendra

Key Terms

Term	Definition
Amazon Kendra	A fully managed ML-powered enterprise document search service that understands natural language questions and returns direct answers from indexed documents.
Incremental Learning (Kendra)	Kendra's ability to improve search result relevance over time by learning from user interaction feedback (clicks, ratings, and query behavior).
Amazon Mechanical Turk	A crowdsourcing marketplace that provides access to a global human workforce for simple, scalable tasks -- primarily used for data labeling, annotation, and content review in AI/ML workflows.
HIT (Human Intelligence Task)	A discrete unit of work posted on Amazon Mechanical Turk -- e.g., 'label this image as cat or dog'. Workers complete HITs for a defined reward.
Amazon Augmented AI (A2I)	An AWS service that adds human review to ML prediction workflows. Routes low-confidence predictions to human reviewers (employees, MTurk workers, or vendor workforce) for validation.
Private Workforce (A2I)	A company's own employees used as human reviewers in an A2I workflow -- the appropriate choice when predictions contain sensitive or confidential data.

Exam Tips:

Kendra = DOCUMENT SEARCH with natural language. Returns direct answers, not just links.
Mechanical Turk = CROWDSOURCED HUMAN TASKS. Primary AI use case = data labeling for ML training.
A2I = HUMAN REVIEW of ML PREDICTIONS when confidence is low. Not for training data -- for reviewing predictions.
A2I workforce options: Private (employees) -> confidential data. MTurk -> general tasks. Vendors -> specialized tasks.
The feedback loop: A2I reviews -> results feed BACK into the model -> model improves over time.
Kendra vs. Q Business: Kendra = search service for embedding in apps. Q Business = full enterprise AI assistant.

Practice Questions

Q1. A company's ML model that classifies incoming insurance claims sometimes returns low-confidence predictions. They want a workflow where a human agent reviews all low-confidence claims before they are processed. Which service implements this?

Amazon Mechanical Turk -- to crowdsource claim review to external workers
Amazon Augmented AI (A2I) -- to route low-confidence predictions to human reviewers
Amazon Comprehend -- to re-classify claims with higher confidence
Amazon Kendra -- to search for similar claims in the archive

Answer: B

Amazon Augmented AI (A2I) is specifically designed to add human review to ML prediction workflows. It routes low-confidence predictions to a designated workforce (employees, MTurk, or vendors) for human validation, ensuring accuracy before business-critical decisions are made.

Q2. A company needs to build a large labeled dataset for training a custom ML model. They need thousands of images labeled by humans. Which AWS service provides access to a crowdsourced workforce for this task?

Amazon A2I
Amazon Kendra
Amazon Mechanical Turk
Amazon Comprehend

Answer: C

Amazon Mechanical Turk (MTurk) is a crowdsourcing marketplace providing access to a global workforce for simple, scalable human tasks. Its primary use in AI/ML is data labeling -- creating labeled training datasets that supervised ML algorithms require.

Q3. An enterprise wants employees to find answers to HR policy questions by asking natural language questions like 'How many vacation days do I get?' Which service provides ML-powered document search with direct answers?

Amazon Comprehend
Amazon Kendra
Amazon Lex
Amazon Translate

Answer: B

Amazon Kendra is an ML-powered enterprise document search service that understands natural language questions and returns direct answers (not just links) from indexed documents. Perfect for internal knowledge bases and HR policy queries.

Q4. A company processes medical records and needs human review of extracted data, but the data is confidential and cannot be shared with external workers. Which A2I workforce option should they use?

Amazon Mechanical Turk workforce
Vendor workforce from AWS Marketplace
Private workforce (company employees)
No workforce -- A2I doesn't support confidential data

Answer: C

A2I supports three workforce options. For confidential or sensitive data, the Private Workforce option uses your own employees as reviewers, ensuring data stays within the organization and maintaining compliance with privacy requirements.

Q5. How does Amazon Kendra improve search accuracy over time?

By requiring manual tuning after each query
Through incremental learning from user feedback and clicks
By reindexing all documents daily
It cannot improve -- accuracy is fixed at deployment

Answer: B

Kendra uses incremental learning to improve search result relevance over time by learning from user interaction feedback (clicks, ratings, query behavior). The more users interact with search results, the better Kendra becomes at understanding what they're looking for.

Medical AI Services (Transcribe Medical, Comprehend Medical, HealthScribe)

Overview:

AWS has healthcare-specific versions of Transcribe and Comprehend, plus a dedicated clinical documentation service. All are HIPAA-eligible -- they can be used in regulated healthcare environments.

---

Amazon Transcribe Medical:

A specialized version of Transcribe designed for medical speech recognition.

Converts clinical audio (doctor dictations, patient calls, clinical discussions) into text
Specialized vocabulary: medical terminology, drug names, procedures, disease names, body parts
HIPAA-eligible for use in regulated healthcare environments
Modes: real-time (microphone) and batch (file upload from S3)

Use Cases:

Physicians dictating medical notes into an EHR (Electronic Health Record) system
Transcribing drug safety call center recordings
Converting clinical trial interview audio to text

---

Amazon Comprehend Medical:

A specialized NLP service for understanding and extracting information from clinical text.

Understands unstructured medical notes, discharge summaries, test results, and case notes
Extracts medical entities and their relationships:
Medications (name, dosage, frequency, route)
Medical conditions and diagnoses
Anatomical terms
Test results and lab values
Detects PHI (Protected Health Information) -- more specific than general PII, covering HIPAA-regulated data
Works with S3 batch input, Kinesis Data Firehose (real-time streaming), or direct API

Comprehend Medical Relationship Extraction:

Goes beyond entity detection to understand the RELATIONSHIP between terms -- e.g., linking a dosage and frequency to a specific drug name. This enables truly structured medical data from free-form clinical notes.

Common Workflow:

Clinical audio -> Amazon Transcribe Medical -> Text
Text -> Amazon Comprehend Medical -> Structured medical entities, relationships, and PHI

---

AWS HealthScribe:

A HIPAA-eligible service that automatically generates clinical documentation from patient-clinician conversations.

What It Does in One Step (from audio):

Creates rich verbatim transcripts
Identifies who is speaking (speaker role identification -- clinician vs. patient)
Classifies dialogue segments (symptom description, medical history, assessment, plan)
Extracts medical terms and concepts
Generates structured clinical notes automatically

Output Sections:

Chief complaint
Medical history
Assessment and diagnosis
Treatment plan

Key Benefit:

Reduces physician documentation burden -- doctors spend less time typing notes and more time with patients. All documentation is auto-generated from the conversation recording.

Currently Available In: Select AWS Regions (accessed within the Amazon Transcribe console)

Service Comparison:

Service	Input	Output	Specialty
Transcribe Medical	Audio	Raw medical text	Medical ASR
Comprehend Medical	Medical text	Structured entities, PHI	Medical NLP
AWS HealthScribe	Clinical conversation audio	Full clinical notes + transcript	End-to-end clinical documentation

Key Terms

Term	Definition
Amazon Transcribe Medical	A HIPAA-eligible version of Transcribe specialized for medical speech recognition -- converting clinical audio to text with deep understanding of medical terminology.
Amazon Comprehend Medical	A HIPAA-eligible NLP service that extracts medical entities (medications, conditions, procedures), their relationships, and PHI from clinical text.
PHI (Protected Health Information)	Health information that identifies a patient and is regulated under HIPAA. Comprehend Medical detects PHI automatically, enabling compliant data handling in healthcare workflows.
AWS HealthScribe	A HIPAA-eligible end-to-end service that generates structured clinical notes, transcripts, and medical summaries automatically from recorded patient-clinician conversations.
Speaker Role Identification (HealthScribe)	HealthScribe's ability to distinguish between the clinician and the patient in a recorded conversation, enabling properly attributed clinical documentation.

Exam Tips:

Transcribe Medical = SPEECH to TEXT for medical audio. HIPAA-eligible.
Comprehend Medical = NLP for medical TEXT. Extracts drugs, conditions, dosages, relationships, and PHI.
HealthScribe = END-TO-END clinical documentation from conversation audio. Generates full clinical notes.
PHI != PII -- PHI is HIPAA-specific health information. Comprehend Medical detects PHI. Standard Comprehend detects general PII.
The pipeline: clinical audio -> Transcribe Medical -> text -> Comprehend Medical -> structured data.
All three services are HIPAA-eligible -- usable in regulated healthcare environments.

Practice Questions

Q1. A hospital wants to automatically create structured clinical notes from patient appointment recordings, including who said what, symptoms discussed, and the treatment plan -- all generated without the physician manually typing anything. Which AWS service is MOST appropriate?

Amazon Transcribe Medical -- to convert the audio to a verbatim transcript
Amazon Comprehend Medical -- to extract medical entities from the transcript
AWS HealthScribe -- to generate full structured clinical notes from the patient-clinician conversation
AWS Mechanical Turk -- to have human transcriptionists review and structure the notes

Answer: C

AWS HealthScribe is specifically designed to generate complete structured clinical documentation from patient-clinician conversation audio in a single step -- producing transcripts with speaker attribution, medical entity extraction, and formatted clinical notes. Transcribe Medical and Comprehend Medical would need to be combined and require additional processing steps.

Q2. A pharmaceutical company needs to extract medication names, dosages, and frequencies from unstructured clinical trial notes. Which service is MOST appropriate?

Amazon Comprehend
Amazon Comprehend Medical
Amazon Textract
Amazon Rekognition

Answer: B

Amazon Comprehend Medical is specialized for medical NLP, extracting medical entities including medications (name, dosage, frequency, route), conditions, and procedures from clinical text. Standard Comprehend doesn't have this medical domain expertise.

Q3. What is the difference between PII (detected by Comprehend) and PHI (detected by Comprehend Medical)?

They are the same thing
PII is general personal info; PHI is HIPAA-specific health information
PHI is for photos; PII is for text
PII is for US data; PHI is for EU data

Answer: B

PII (Personally Identifiable Information) is general personal data like names and addresses. PHI (Protected Health Information) is specifically defined under HIPAA and includes health-related data. Comprehend Medical detects PHI, which is more specific than general PII.

Q4. What is the typical workflow for processing clinical audio into structured medical data?

Amazon Polly -> Amazon Translate
Amazon Transcribe Medical -> Amazon Comprehend Medical
Amazon Rekognition -> Amazon Textract
Amazon Lex -> Amazon Personalize

Answer: B

The standard medical audio processing pipeline is: Amazon Transcribe Medical (converts clinical audio to text) -> Amazon Comprehend Medical (extracts structured medical entities, relationships, and PHI from the text).

Q5. Are Amazon Transcribe Medical, Comprehend Medical, and HealthScribe suitable for use in regulated healthcare environments?

No -- they cannot handle sensitive health data
Yes -- all three are HIPAA-eligible services
Only Comprehend Medical is HIPAA-eligible
Only when used with Amazon Macie

Answer: B

All three services -- Amazon Transcribe Medical, Amazon Comprehend Medical, and AWS HealthScribe -- are HIPAA-eligible, meaning they can be used in regulated healthcare environments when properly configured as part of a HIPAA-compliant architecture.

Amazon EC2 for AI -- Trainium and Inferentia

EC2 Overview (AI Perspective):

Amazon EC2 (Elastic Compute Cloud) provides virtual servers in the cloud. While most AWS AI services are fully managed, some organizations build and train their own large models directly on EC2 instances. For this, AWS has created specialized ML hardware.

Standard GPU-Based EC2 Instances:

For general ML workloads, GPU instances provide the parallel compute needed for deep learning.

Family	Best For
P3, P4, P5	High-performance ML training (NVIDIA GPUs)
G3, G4, G5, G6	ML inference and some training (NVIDIA GPUs)

---

AWS Trainium -- Custom ML Training Chips:

AWS-designed ML chips built specifically for training large deep learning models.

Optimized for training models with 100 billion+ parameters
EC2 instance type: Trn1 (e.g., trn1.32xlarge has 16 Trainium accelerators)
Advertised benefit: up to 50% cost reduction vs. comparable GPU-based training instances
Use when: training very large custom models directly on EC2 (not using SageMaker)

---

AWS Inferentia -- Custom ML Inference Chips:

AWS-designed chips optimized for running ML inference (predictions) at scale and low cost.

EC2 instance types: Inf1, Inf2
Benefit: up to 4x higher throughput than comparable GPU instances
Benefit: up to 70% cost reduction vs. GPU-based inference instances
Use when: serving a trained model at high volume and need cost-efficient, fast inference

---

Trainium vs. Inferentia:

Chip	EC2 Type	Use Case	Key Benefit
AWS Trainium	Trn1	TRAINING deep learning models	~50% cost reduction vs. GPU training
AWS Inferentia	Inf1, Inf2	INFERENCE (predictions) from trained models	~4x throughput, ~70% cost reduction vs. GPU inference

Environmental Note (Exam):

Trainium and Inferentia instances have the LOWEST environmental footprint of all ML compute options on AWS -- because they are the most energy-efficient per unit of ML work performed.

Decision Framework:

Using a managed service (Bedrock, Rekognition, etc.)? -> No EC2 needed.
Training a large custom model? -> Trn1 (Trainium) or P-family GPU instances.
Serving a trained model at scale? -> Inf1/Inf2 (Inferentia) or G-family GPU instances.
Need lowest cost? -> Trainium for training, Inferentia for inference.

Key Terms

Term	Definition
AWS Trainium	AWS-designed ML accelerator chips for training large deep learning models (100B+ parameters). Available on Trn1 EC2 instances with up to 50% cost savings vs. GPU training instances.
AWS Inferentia	AWS-designed ML inference chips for running trained models at scale. Available on Inf1 and Inf2 EC2 instances with up to 4x throughput and 70% cost savings vs. GPU inference instances.
Trn1 Instance	An EC2 instance type powered by AWS Trainium chips, optimized for training large-scale deep learning models at lower cost than GPU instances.
Inf1 / Inf2 Instance	EC2 instance types powered by AWS Inferentia chips, optimized for cost-efficient, high-throughput ML inference at scale.

Exam Tips:

Trainium = TRAINING. Inferentia = INFERENCE. The names hint at their purpose.
Trn1 = Trainium instances. Inf1/Inf2 = Inferentia instances. Know the instance type naming.
Trainium = 50% cost savings vs. GPU training. Inferentia = 4x throughput + 70% cost savings vs. GPU inference.
Both Trainium and Inferentia have the LOWEST ENVIRONMENTAL FOOTPRINT of AWS ML hardware.
P-family EC2 = GPU-based training (NVIDIA). G-family = GPU-based inference. Trn1 = Trainium. Inf1/Inf2 = Inferentia.
If exam asks about cost-optimized ML hardware on EC2 -> Trainium (training) or Inferentia (inference).

Practice Questions

Q1. A research team is training a 200-billion parameter language model directly on Amazon EC2. They need to minimize training costs while maintaining high performance. Which EC2 instance family should they use?

G5 instances -- for GPU-accelerated training with NVIDIA A10G GPUs
Inf2 instances -- for cost-efficient inference of large language models
Trn1 instances -- powered by AWS Trainium chips, offering up to 50% cost savings for large model training
P4 instances -- for highest absolute training performance with NVIDIA A100 GPUs

Answer: C

AWS Trainium chips (Trn1 instances) are specifically optimized for training large deep learning models at up to 50% lower cost than comparable GPU instances. For a 200B parameter model training workload where cost optimization is the priority, Trn1 is the correct choice.

Q2. A company has a trained recommendation model they need to serve to 10 million users daily with very low latency and the lowest possible cost per inference. Which EC2 instance type is MOST appropriate?

Trn1 -- for high-throughput model training
Inf2 -- powered by AWS Inferentia, providing up to 4x throughput and 70% cost savings for inference
P5 -- for maximum GPU compute in inference workloads
G6 -- for balanced training and inference workloads

Answer: B

AWS Inferentia chips (Inf1/Inf2 instances) are optimized for ML inference at scale -- delivering up to 4x the throughput and 70% cost savings compared to GPU-based inference instances. For high-volume, cost-sensitive inference workloads, Inferentia is the purpose-built solution.

Q3. A company wants to minimize their environmental impact when running ML workloads on AWS. Which hardware option has the lowest environmental footprint?

Standard EC2 instances
P5 GPU instances
AWS Trainium and Inferentia chips
G6 instances

Answer: C

AWS Trainium and Inferentia have the lowest environmental footprint of all ML compute options on AWS because they are the most energy-efficient per unit of ML work performed. They're purpose-built for ML, unlike general-purpose GPUs.

Q4. What is the primary difference between AWS Trainium and AWS Inferentia?

Trainium is for images; Inferentia is for text
Trainium is for TRAINING large models; Inferentia is for INFERENCE (predictions)
They are the same chip with different names
Trainium is older; Inferentia is newer

Answer: B

AWS Trainium is designed for TRAINING large deep learning models (100B+ parameters) with up to 50% cost savings. AWS Inferentia is designed for running INFERENCE at scale with up to 4x throughput and 70% cost savings. Different chips for different phases of the ML lifecycle.

Q5. Which EC2 instance type uses AWS Trainium chips?

P4
G5
Trn1
Inf2

Answer: C

Trn1 instances are powered by AWS Trainium chips, optimized for training large-scale deep learning models. The naming convention is: Trn = Trainium (training), Inf = Inferentia (inference).

Q6. When would you choose P5 GPU instances over Trn1 Trainium instances for ML training?

When cost optimization is the top priority
When you need maximum absolute performance and compatibility with NVIDIA CUDA libraries
When environmental impact is the main concern
P5 is always better than Trn1

Answer: B

P5 instances with NVIDIA GPUs provide maximum training performance and full compatibility with CUDA libraries and existing GPU-optimized code. Trn1 offers better cost efficiency but may require code adaptation. Choose based on whether cost or compatibility is your priority.

AWS AI Practitioner - Table of Contents

Master all exam topics with comprehensive study guides and practice questions.

AWS AI Practitioner - Practice Tests Real Time Practice Tests AWS AI Practitioner Preparation Topics Cover all exam domains Introduction to AWS & Cloud Computing AWS basics, cloud models, pricing Amazon Bedrock and Generative AI Foundation Models, Bedrock, GenAI Prompt Engineering Prompt techniques, optimization Amazon Q - Deep Dive Q Business, Q Developer, Q Apps AI & Machine Learning Fundamentals AI/ML hierarchy, training, learning types AWS Managed AI Services Comprehend, Translate, Transcribe, more Amazon SageMaker - Deep Dive End-to-end ML platform deep dive AI Challenges and Responsibilities Responsible AI, bias, governance AWS Security Services IAM, S3, encryption, compliance

Search Tutorials

AWS AI Practitioner - AWS Managed AI Services

Why AWS Managed AI Services?

Overview:

What Makes Managed AI Services Different from Bedrock:

Why Use Managed Services?

Service Map -- What Does What:

Key Terms

Practice Questions

Amazon Comprehend

What is Amazon Comprehend?

Core Capabilities (Out-of-the-Box):

Supported Document Types:

Custom Capabilities (Requires Training Data):

1. Custom Classification:

2. Custom Entity Recognition:

Analysis Modes:

Key Exam Scenario:

Key Terms

Practice Questions

Amazon Translate

What is Amazon Translate?

Core Capabilities:

Advanced Features:

Custom Terminology:

Parallel Data (Style Customization):

Pricing:

Use Cases:

Exam Summary:

Key Terms

Practice Questions

Amazon Transcribe

What is Amazon Transcribe?

Core Capabilities:

Key Features:

PII Redaction:

Automatic Language Identification:

Improving Transcription Accuracy:

Toxicity Detection:

Use Cases:

Exam Tip:

Key Terms

Practice Questions

Amazon Polly

What is Amazon Polly?

Voice Engines (Newest to Oldest):

Advanced Features:

Lexicons:

SSML (Speech Synthesis Markup Language):

Speech Marks:

Use Cases:

Quick Reference: Transcribe vs. Polly:

Key Terms

Practice Questions

Amazon Rekognition

What is Amazon Rekognition?

Core Capabilities:

Custom Labels:

Content Moderation in Detail:

Content Moderation API Flow (Exam Scenario):

Supported Input Formats:

Key Terms

Practice Questions

Amazon Lex

What is Amazon Lex?

Core Concepts:

How a Lex Bot Works:

Two Creation Methods:

Key Integrations:

Use Cases:

Exam Tip:

Key Terms

Practice Questions

Amazon Personalize

What is Amazon Personalize?

Key Positioning:

How It Works:

Recipes -- Pre-Built Algorithms:

Delivery Channels:

Use Cases: