AWS AI Practitioner - AWS Managed AI Services
Why AWS Managed AI Services?
Overview:
Before Amazon Bedrock existed, AWS built a portfolio of purpose-built, pre-trained AI services -- each solving a specific, narrow use case. These services are still heavily tested on the exam and widely used in production.
What Makes Managed AI Services Different from Bedrock:
| Feature | Managed AI Services | Amazon Bedrock |
|---|---|---|
| Purpose | Narrow, specific use case (e.g., transcription only) | General-purpose GenAI -- text, images, code |
| Model choice | Fixed -- AWS manages the model | Choose from 30+ Foundation Models |
| Setup | Zero configuration -- API ready | Requires model selection and prompt design |
| Training | Pre-trained -- works out of the box (some support customization) | Fine-tunable FMs |
| Pricing | Pay per request (tokens, minutes, pages, API calls) | Pay per token (on-demand) or provisioned throughput |
Why Use Managed Services?
- Availability -- deployed across multiple AWS Regions, always on
- Redundancy -- built across multiple Availability Zones (fault-tolerant)
- Performance -- specialized CPUs/GPUs embedded in the service infrastructure
- Cost efficiency -- pay only for what you use; no server over-provisioning needed
- No ML expertise required -- AWS handles model training, updates, and infrastructure
Service Map -- What Does What:
| Category | Service | Primary Function |
|---|---|---|
| Text/Document Processing | Amazon Comprehend | NLP -- entities, sentiment, PII, classification |
| Translation | Amazon Translate | Language translation |
| Speech-to-Text | Amazon Transcribe | Audio -> Text (ASR) |
| Text-to-Speech | Amazon Polly | Text -> Audio |
| Vision | Amazon Rekognition | Image/video analysis |
| Chatbots | Amazon Lex | Conversational AI chatbot builder |
| Recommendations | Amazon Personalize | Personalized product/content recommendations |
| Document Extraction | Amazon Textract | Extract text, forms, tables from documents |
| Document Search | Amazon Kendra | ML-powered enterprise document search |
| Human Tasks | Amazon Mechanical Turk | Crowdsourced human task workforce |
| Human Review | Amazon Augmented AI (A2I) | Human review of low-confidence ML predictions |
| Medical NLP | Amazon Comprehend Medical | NLP for medical text and PHI detection |
| Medical Speech | Amazon Transcribe Medical | Speech-to-text for medical terminology |
| Clinical Notes | AWS HealthScribe | Auto-generate clinical notes from patient conversations |
| AI Hardware | AWS Trainium / AWS Inferentia | Specialized ML training and inference chips on EC2 |
Key Terms
| Term | Definition |
|---|---|
| AWS Managed AI Services | Pre-trained, purpose-built AI services from AWS designed to solve specific AI tasks (e.g., translation, transcription, vision) without requiring customers to train or manage ML models. |
| Pre-trained Model | A machine learning model already trained by AWS on large datasets. Customers use it via API without any model training -- works out of the box for its specific task. |
- Managed AI services = narrow, specific purpose. Bedrock = general GenAI. Know the distinction.
- These services are pay-per-use (per API call, per minute of audio, per page) -- no server management needed.
- The exam frequently presents a scenario and asks which service to use -- the service map above is your key reference.
- Many services support CUSTOMIZATION on top of the pre-trained base (custom classifiers in Comprehend, custom vocabularies in Transcribe, custom labels in Rekognition).
Practice Questions
Q1. A startup wants to add sentiment analysis to their customer feedback application but has no ML expertise and needs to launch quickly. Which approach should they take?
- Train a custom model on Amazon SageMaker
- Use Amazon Comprehend, a pre-trained managed AI service
- Build a neural network from scratch on EC2
- Use Amazon Bedrock to create a custom foundation model
Answer: B
Amazon Comprehend is a managed AI service that provides sentiment analysis out-of-the-box with no ML expertise required. It's pre-trained and ready to use via API, perfect for quick deployment without custom model training.
Q2. What is the PRIMARY advantage of AWS Managed AI Services compared to training custom ML models?
- They provide more customization options
- They require no ML expertise and work out-of-the-box
- They are always free to use
- They support more programming languages
Answer: B
AWS Managed AI Services are pre-trained and API-ready, requiring no ML expertise to implement. They solve specific AI tasks (translation, transcription, Vision) without customers needing to train or manage ML models themselves.
Q3. A company needs to process images for object detection AND translate customer reviews into multiple languages. How many AWS Managed AI Services are needed?
- One -- Amazon Bedrock handles both tasks
- Two -- Amazon Rekognition for images and Amazon Translate for translation
- One -- Amazon Comprehend handles all text and image tasks
- Three -- one for each task plus a coordination service
Answer: B
Each AWS Managed AI Service has a specific, narrow purpose. Amazon Rekognition handles image analysis (object detection), while Amazon Translate handles language translation. These are complementary services, each solving one specific task.
Q4. Which pricing model applies to AWS Managed AI Services?
- Fixed monthly subscription per service
- Pay-per-use based on API calls, minutes processed, or pages analyzed
- Upfront annual commitment required
- Free tier only with no paid options
Answer: B
AWS Managed AI Services use pay-per-use pricing -- you pay based on consumption (per API call, per minute of audio transcribed, per page of documents analyzed). There's no server provisioning or capacity management required.
Q5. A financial services company needs to extract data from invoices AND detect toxic content in customer communications. Which services should they use?
- Amazon Textract for document extraction and Amazon Comprehend for toxicity detection
- Amazon Kendra for both tasks
- Amazon Rekognition for both document and content analysis
- Amazon SageMaker to build custom models for both tasks
Answer: A
Amazon Textract extracts structured data (forms, tables, key-value pairs) from documents like invoices. Amazon Comprehend provides NLP capabilities including sentiment analysis and content classification for detecting toxic or inappropriate text in communications.
Amazon Comprehend
What is Amazon Comprehend?
Amazon Comprehend is a fully managed, serverless Natural Language Processing (NLP) service that uses machine learning to discover insights, relationships, and meaning in unstructured text.
Core Capabilities (Out-of-the-Box):
| Capability | What It Does | Example |
|---|---|---|
| Named Entity Recognition (NER) | Identify people, places, organizations, dates, quantities in text | 'Zhang Wei' -> Person, 'July 31st' -> Date |
| Sentiment Analysis | Determine if text is positive, negative, neutral, or mixed | Customer review -> 'Negative, 85% confidence' |
| Key Phrase Extraction | Pull out the most important phrases from text | 'minimum payment due' from a billing letter |
| PII Detection | Identify personally identifiable information in text | Names, phone numbers, credit card numbers, SSNs |
| Language Detection | Identify the language of the input text | 'English, 99% confidence' |
| Targeted Sentiment | Sentiment about specific entities in text | How customers feel about a specific product feature |
| Syntax Analysis | Identify parts of speech (noun, verb, adjective) | Grammatical parsing of text |
Supported Document Types:
Text, PDF, Word (.docx), images
Custom Capabilities (Requires Training Data):
1. Custom Classification:
Train Comprehend to categorize documents into YOUR own defined categories.
- You provide labeled examples (minimum ~10 per class)
- Store training data (CSV format) in Amazon S3
- Comprehend trains a custom classifier internally
- Available as: real-time, synchronous batch, or asynchronous analysis
- Use case: Route incoming support emails to Billing, Technical Support, Complaint, or Feature Request categories
2. Custom Entity Recognition:
Train Comprehend to identify entities specific to YOUR business -- not just generic person/place/date.
- Provide a list of target entities and documents containing them
- Comprehend learns what your specific entity looks like in context
- Use case: Automatically extract policy numbers, product model numbers, or customer escalation phrases from documents
Analysis Modes:
- Real-time -- synchronous; immediate results for single documents
- Asynchronous (Batch) -- submit large volumes of documents from S3; process offline
Key Exam Scenario:
Customer sends emails -> Comprehend classifies them as billing/support/complaints -> route to correct team automatically.
Key Terms
| Term | Definition |
|---|---|
| Amazon Comprehend | A fully managed NLP service that extracts entities, key phrases, sentiment, PII, and language from unstructured text. Supports custom classifiers and custom entity recognizers. |
| Named Entity Recognition (NER) | Comprehend's ability to automatically identify and categorize specific entities in text -- people, organizations, locations, dates, quantities -- without any training required. |
| Sentiment Analysis | Comprehend's capability to determine whether text has a positive, negative, neutral, or mixed emotional tone. Used to analyze customer feedback and support interactions. |
| PII Detection (Comprehend) | Comprehend's ability to identify personally identifiable information -- names, credit card numbers, phone numbers, SSNs -- in text, enabling automated redaction or compliance workflows. |
| Custom Classification (Comprehend) | A Comprehend feature where users provide labeled training examples to teach Comprehend to categorize documents into custom business-specific categories (e.g., billing, support, complaints). |
| Custom Entity Recognition (Comprehend) | A Comprehend feature that allows training the model to recognize business-specific entities (e.g., policy numbers, product codes) by providing labeled examples of those entities in context. |
- Comprehend = NLP service. Use it for: sentiment, entities, PII, key phrases, language detection, document classification.
- Custom Classification = YOU define the categories. Comprehend learns from your labeled examples.
- Custom Entity Recognition = YOU define the entity types. Used for business-specific terms not in standard NER.
- PII detection is a built-in, out-of-the-box Comprehend feature -- no training needed.
- Comprehend Medical = separate service for medical text. Comprehend = general text.
- Training data for custom models -> stored in Amazon S3 in CSV format.
Practice Questions
Q1. A company receives thousands of customer support emails daily and wants to automatically route them to the correct team (billing, technical support, or general complaints) without manual review. Which Comprehend feature enables this?
- Named Entity Recognition -- to identify the customer's name in each email
- Custom Classification -- to categorize emails into custom business-defined categories
- Sentiment Analysis -- to route negative emails to the complaints team
- PII Detection -- to identify customer accounts in the emails
Answer: B
Custom Classification in Amazon Comprehend allows you to define your own document categories (billing, technical support, complaints) and train Comprehend with labeled examples of each. Once trained, it automatically classifies incoming emails into the correct category for routing.
Q2. A legal firm wants to process thousands of contracts stored in S3 and automatically redact any personally identifiable information before sharing them externally. Which Amazon Comprehend capability is MOST relevant?
- Custom Entity Recognition -- to identify PII as a custom entity type
- Key Phrase Extraction -- to extract important legal terms
- PII Detection -- built-in capability to identify names, SSNs, credit card numbers, and other PII in text
- Custom Classification -- to classify contracts by PII risk level
Answer: C
Amazon Comprehend includes a built-in PII Detection capability that identifies personally identifiable information (names, phone numbers, SSNs, credit card numbers, etc.) in text without any custom training. Combined with batch processing from S3, this enables automated PII redaction workflows.
Q3. A news organization wants to automatically identify and tag all people, organizations, and locations mentioned in articles. Which Comprehend capability provides this out-of-the-box?
- Custom Classification
- Named Entity Recognition (NER)
- Sentiment Analysis
- Key Phrase Extraction
Answer: B
Named Entity Recognition (NER) is a built-in Comprehend capability that automatically identifies and categorizes entities like people, organizations, locations, dates, and quantities in text -- no training required.
Q4. A retail company wants to analyze customer product reviews to understand if customers feel positive, negative, or neutral about their purchases. Which Comprehend feature should they use?
- Key Phrase Extraction
- Language Detection
- Sentiment Analysis
- Custom Entity Recognition
Answer: C
Sentiment Analysis determines whether text has a positive, negative, neutral, or mixed emotional tone. This is perfect for analyzing customer reviews to understand overall customer satisfaction and feelings about products.
Q5. What format must training data be in when creating a Custom Classification model in Amazon Comprehend?
- JSON stored in DynamoDB
- CSV stored in Amazon S3
- XML stored in Amazon RDS
- Parquet stored in Redshift
Answer: B
Amazon Comprehend Custom Classification requires training data to be stored in CSV format in Amazon S3. The CSV contains labeled examples that Comprehend uses to learn your custom document categories.
Q6. A company receives documents in multiple languages and needs to detect which language each document is written in before processing. Which Comprehend capability handles this?
- Custom Entity Recognition
- Targeted Sentiment
- Language Detection
- Syntax Analysis
Answer: C
Language Detection is a built-in Comprehend capability that identifies the language of input text. This is useful for routing documents to appropriate processing pipelines based on their language.
Amazon Translate
What is Amazon Translate?
Amazon Translate is a fully managed neural machine translation service that provides accurate, natural-sounding language translation at scale.
Core Capabilities:
- Translate text between a wide range of language pairs
- Translate entire documents (plain text, HTML, .docx)
- Batch translation of many files at once using S3 as input/output
- Supports real-time API calls for individual translation requests
Advanced Features:
Custom Terminology:
- Define how specific terms should be translated
- Essential for brand names, product names, character names, or technical jargon that need consistent translation
- Provided as a dictionary file in CSV, TSV, or TMX format
- Example: Ensure 'EC2' is always translated as 'EC2' and not as a generic phrase
Parallel Data (Style Customization):
- Customize the STYLE or FORMALITY of translation
- Example: 'How are you?' in French for an informal context -> 'Comment ca va?' (casual). For a law office -> 'Comment allez-vous?' (formal)
- Use parallel data to define input examples and their desired translated style
Pricing:
Pay per character translated -- billed based on the volume of text processed.
Use Cases:
- Localizing a website or app for international users
- Translating large batches of customer support tickets
- Multilingual content generation
- Real-time translation in a chat application
Exam Summary:
Amazon Translate = neural machine translation service. Key customization features: Custom Terminology (specific term translations) and Parallel Data (translation style/formality).
Key Terms
| Term | Definition |
|---|---|
| Amazon Translate | A fully managed neural machine translation service for accurate, natural-sounding language translation at scale. Supports real-time, document, and batch translation. |
| Custom Terminology (Translate) | A dictionary that ensures specific terms (brand names, product names, acronyms) are translated consistently and correctly, overriding the default translation. |
| Parallel Data (Translate) | Example input-output translation pairs that customize the style or formality of Amazon Translate's output -- e.g., formal vs. informal language style. |
- Translate = language translation service. Simple and specific -- if the scenario mentions translating content to another language -> Translate.
- Custom Terminology = controls HOW specific terms are translated (brand names, product names).
- Parallel Data = controls the STYLE of translation (formal vs. informal register).
- Batch translation uses S3 as input and output -- useful for translating large document collections.
Practice Questions
Q1. A global e-commerce company wants to translate its product catalog into 12 languages but needs to ensure its brand name 'NovaBike' is never translated into local equivalents -- it must always appear as 'NovaBike' in all languages. Which Translate feature enables this?
- Parallel Data -- to provide formal translation style examples
- Custom Terminology -- to define that 'NovaBike' should always remain untranslated
- Batch Translation -- to process all catalog items simultaneously
- Language Detection -- to identify the source language of each product description
Answer: B
Custom Terminology allows you to define specific terms that should be translated in a particular way -- or not translated at all. Brand names like 'NovaBike' that must appear consistently across all languages should be added to a Custom Terminology dictionary.
Q2. A law firm needs to translate client communications into French but requires a formal tone appropriate for legal correspondence. Which Amazon Translate feature enables style customization?
- Custom Terminology
- Parallel Data
- Batch Translation
- Real-time API
Answer: B
Parallel Data allows you to customize the style and formality of Amazon Translate's output. By providing example translations in your desired style (formal legal language), Translate learns to produce appropriately formal translations.
Q3. How is Amazon Translate priced?
- Fixed monthly fee per language pair
- Per character translated
- Per document regardless of size
- Free for all use cases
Answer: B
Amazon Translate is billed based on the number of characters translated. This pay-per-character model means you only pay for the actual volume of text processed.
Q4. A company needs to translate 10,000 product descriptions stored in S3 to multiple languages. What is the most efficient approach?
- Call the real-time API 10,000 times sequentially
- Use Amazon Translate batch translation with S3 as input and output
- Use Amazon Comprehend to classify and translate
- Build a custom translation model in SageMaker
Answer: B
Amazon Translate supports batch translation of many files at once using S3 as both input and output. This is the most efficient approach for processing large volumes of documents compared to sequential API calls.
Q5. What is the difference between Custom Terminology and Parallel Data in Amazon Translate?
- Custom Terminology controls pronunciation; Parallel Data controls spelling
- Custom Terminology controls specific term translations; Parallel Data controls translation style/formality
- Both are identical features with different names
- Custom Terminology is for batch; Parallel Data is for real-time
Answer: B
Custom Terminology defines HOW specific terms should be translated (or kept untranslated, like brand names). Parallel Data customizes the STYLE of translation (formal vs. informal register) by providing example input-output translation pairs.
Amazon Transcribe
What is Amazon Transcribe?
Amazon Transcribe is a fully managed Automatic Speech Recognition (ASR) service that converts audio speech into accurate text using deep learning.
Core Capabilities:
- Convert spoken audio (microphone, phone call, media file) to text
- Supports real-time streaming transcription
- Supports batch transcription of audio files from S3
- Automatic punctuation and formatting
Key Features:
PII Redaction:
- Automatically detects and removes PII from transcription output
- Redacts: names, phone numbers, SSNs, credit card numbers, dates of birth
- Use case: Transcribing customer support calls while removing sensitive customer data
Automatic Language Identification:
- Detect and handle multiple languages within the same audio stream
- Example: Seamlessly transcribe a conversation that switches between English and French
Improving Transcription Accuracy:
| Feature | What It Improves | How |
|---|---|---|
| Custom Vocabulary | Recognition of specific WORDS | Provide a list of domain-specific words, acronyms, brand names, and pronunciation hints |
| Custom Language Model | Understanding of CONTEXT | Train Transcribe on domain-specific text data so it understands industry terminology in context |
Use BOTH together for highest accuracy.
Example: Without customization, 'AWS Microservices' might be transcribed as 'USA micro services'. With a Custom Vocabulary for 'AWS' and a Custom Language Model trained on IT text, it correctly transcribes 'AWS Microservices'.
Toxicity Detection:
- Detects toxic speech in audio using BOTH speech cues (tone, pitch) AND text cues (profanity, hate speech)
- Categories detected: sexual harassment, hate speech, threats, abuse, profanity, insults, graphic content
- The combination of audio tone analysis + text analysis makes this more powerful than text-only detection
Use Cases:
- Transcribing customer service calls for quality assurance
- Automated closed captioning and subtitling
- Creating searchable archives from recorded meetings
- Generating metadata for media assets
Exam Tip:
Amazon Transcribe = speech-to-text (audio -> text). Amazon Polly = text-to-speech (text -> audio). These are OPPOSITES.
Key Terms
| Term | Definition |
|---|---|
| Amazon Transcribe | A fully managed ASR (Automatic Speech Recognition) service that converts speech to text using deep learning. Supports real-time and batch transcription. |
| Automatic Speech Recognition (ASR) | The deep learning technology behind Amazon Transcribe that converts audio waveforms to text accurately and efficiently. |
| Custom Vocabulary (Transcribe) | A user-provided list of domain-specific words, acronyms, and brand names that improves recognition of specific TERMS in transcription. |
| Custom Language Model (Transcribe) | A model trained on domain-specific text data that improves Transcribe's understanding of CONTEXT -- how specific terms are used in a particular industry or domain. |
| Toxicity Detection (Transcribe) | A Transcribe feature that uses both audio cues (tone, pitch) and text cues (profanity, hate speech) to detect toxic speech across categories: harassment, threats, hate speech, abuse, etc. |
| PII Redaction (Transcribe) | Transcribe's ability to automatically detect and remove personally identifiable information from transcription output, enabling compliant handling of sensitive conversations. |
- Transcribe = SPEECH to TEXT. Polly = TEXT to SPEECH. Opposites -- memorize both directions.
- Custom Vocabulary = specific WORDS. Custom Language Model = domain CONTEXT. Both together = maximum accuracy.
- Toxicity Detection = uses BOTH audio (tone/pitch) + text (words) -- not just text analysis.
- PII Redaction in Transcribe automatically removes names, phone numbers, SSNs from transcription output.
- Auto Language Identification allows Transcribe to handle MULTILINGUAL audio in a single stream.
- Use case: 'transcribe customer calls and remove PII' -> Amazon Transcribe with PII redaction enabled.
Practice Questions
Q1. A healthcare company transcribes patient calls using Amazon Transcribe but finds that medical drug names and procedure codes are frequently transcribed incorrectly. Which feature(s) should they enable?
- PII Redaction -- to remove drug names from the transcription
- Custom Vocabulary and Custom Language Model -- to improve recognition of medical terms and their context
- Automatic Language Identification -- to detect which language each drug name is from
- Toxicity Detection -- to flag inappropriate language in patient calls
Answer: B
Custom Vocabulary adds specific medical drug names and procedure codes to Transcribe's recognition dictionary. Custom Language Model trains Transcribe on medical domain text so it understands the context in which these terms appear. Together, they deliver the highest transcription accuracy for specialized medical terminology.
Q2. A customer service center wants to transcribe support calls while automatically removing customer names, phone numbers, and credit card numbers from the output. Which Transcribe feature enables this?
- Custom Vocabulary
- Automatic Language Identification
- PII Redaction
- Toxicity Detection
Answer: C
PII Redaction automatically detects and removes personally identifiable information (names, phone numbers, SSNs, credit card numbers) from transcription output, enabling compliant handling of sensitive customer conversations.
Q3. What is the relationship between Amazon Transcribe and Amazon Polly?
- They are the same service with different names
- They are opposites -- Transcribe converts speech to text; Polly converts text to speech
- Transcribe is for video; Polly is for audio only
- Both convert text to speech in different languages
Answer: B
Amazon Transcribe and Amazon Polly are opposite services. Transcribe uses ASR (Automatic Speech Recognition) to convert speech audio into text. Polly uses TTS (Text-to-Speech) to convert written text into spoken audio.
Q4. A moderation team wants to detect toxic speech in recorded audio, using both the words spoken AND the tone of voice. Which Transcribe feature provides this?
- PII Redaction
- Custom Language Model
- Toxicity Detection
- Custom Vocabulary
Answer: C
Toxicity Detection analyzes BOTH audio cues (tone, pitch) AND text cues (profanity, hate speech) to detect toxic speech. This dual-signal approach is more powerful than text-only toxicity detection.
Q5. A global company has call recordings where speakers switch between English and Spanish within the same conversation. How can Amazon Transcribe handle this?
- It cannot -- separate recordings are needed for each language
- Automatic Language Identification can detect and handle multiple languages in the same audio stream
- Custom Vocabulary must include words from both languages
- PII Redaction automatically translates between languages
Answer: B
Automatic Language Identification allows Amazon Transcribe to detect and handle multiple languages within the same audio stream, seamlessly transcribing conversations that switch between languages.
Amazon Polly
What is Amazon Polly?
Amazon Polly is a fully managed text-to-speech (TTS) service that converts written text into natural-sounding human speech using deep learning. It's the OPPOSITE of Amazon Transcribe.
Voice Engines (Newest to Oldest):
| Engine | Quality | Best For |
|---|---|---|
| Generative | Most expressive, adaptive speech using GenAI | High-quality, natural conversational applications |
| Long-form | High quality for longer content | Audiobooks, long-form narration |
| Neural | Human-like, more natural than standard | Most production use cases |
| Standard | Basic TTS, original engine | Legacy/simple use cases |
Advanced Features:
Lexicons:
- Define how specific text strings should be PRONOUNCED (not translated)
- Example: When Polly sees 'AWS', speak 'Amazon Web Services'
- Example: When Polly sees 'W3C', speak 'World Wide Web Consortium'
- Different from Translate's Custom Terminology -- Lexicons are about pronunciation, not translation
SSML (Speech Synthesis Markup Language):
- XML-based markup language that gives fine-grained control over speech output
- Controls: pauses, emphasis, whisper, pronunciation of abbreviations, speaking rate, pitch
- Example:
-> produces a 1-second pause between 'Hello' and 'how are you'Hello how are you? - Enables highly customized, natural-sounding outputs for specific contexts
Speech Marks:
- Metadata that tells you WHERE in the audio a specific word or sentence starts and ends
- Use cases: lip-syncing animated characters, highlighting words in a karaoke-style display as they are spoken
Use Cases:
- Applications that talk (e.g., navigation systems, customer service bots)
- Accessibility features for visually impaired users
- Audiobook generation
- E-learning content
- IVR (Interactive Voice Response) systems
Quick Reference: Transcribe vs. Polly:
| Service | Direction | Technology |
|---|---|---|
| Amazon Transcribe | Audio -> Text | ASR (speech recognition) |
| Amazon Polly | Text -> Audio | TTS (speech synthesis) |
Key Terms
| Term | Definition |
|---|---|
| Amazon Polly | A fully managed text-to-speech service that converts written text into natural-sounding speech. The opposite of Amazon Transcribe (which does speech-to-text). |
| SSML (Speech Synthesis Markup Language) | An XML-based markup language used with Polly to control how text is spoken -- adding pauses, emphasis, whispers, pronunciation hints, speaking rate adjustments, and more. |
| Lexicons (Polly) | Custom pronunciation dictionaries that tell Polly how to pronounce specific text strings -- e.g., expand 'AWS' to say 'Amazon Web Services'. |
| Speech Marks (Polly) | Metadata output from Polly that identifies where in the audio each word or sentence begins and ends. Used for lip-syncing animations or highlighting text as it is spoken. |
| Generative Voice Engine (Polly) | Polly's newest and most expressive voice engine, powered by generative AI, producing the most natural and adaptive speech quality. |
- Polly = TEXT to SPEECH. Transcribe = SPEECH to TEXT. These appear together as a trick question.
- Lexicons = control PRONUNCIATION of specific strings (AWS -> 'Amazon Web Services').
- SSML = fine-grained control of HOW text is spoken (pauses, emphasis, whisper, speaking rate).
- Speech Marks = WHERE in audio each word starts/ends. Used for lip-sync and word highlighting.
- Generative > Long-form > Neural > Standard -- newest to oldest, best to most basic quality.
- Polly Lexicons vs. Translate Custom Terminology: Lexicons = pronunciation. Terminology = translation mapping.
Practice Questions
Q1. An e-learning platform uses Amazon Polly to read course content aloud. They need words to be highlighted in the UI as they are spoken, synchronized with the audio. Which Polly feature enables this?
- SSML -- to add markup tags that control word timing
- Lexicons -- to define how each word is pronounced
- Speech Marks -- to get metadata about where each word starts and ends in the audio
- Generative Engine -- to produce the most accurate timing for each word
Answer: C
Speech Marks is a Polly feature that provides metadata identifying exactly where in the audio each word or sentence begins and ends. This timing information enables the UI to synchronize word highlighting with the audio playback, creating a karaoke-style reading experience.
Q2. A company wants Amazon Polly to pronounce 'AWS' as 'Amazon Web Services' every time it appears in their text-to-speech application. Which feature enables this?
- SSML markup
- Lexicons
- Speech Marks
- Neural Engine
Answer: B
Lexicons define how specific text strings should be pronounced. By adding 'AWS' to a lexicon with the pronunciation 'Amazon Web Services', Polly will expand the acronym every time it encounters it.
Q3. A developer needs precise control over speech output including pauses, emphasis, whispers, and speaking rate. Which Polly feature provides this fine-grained control?
- Lexicons
- SSML (Speech Synthesis Markup Language)
- Speech Marks
- Custom Vocabulary
Answer: B
SSML is an XML-based markup language that gives fine-grained control over how text is spoken -- allowing pauses, emphasis, whispers, pronunciation hints, speaking rate adjustments, and more.
Q4. Which Amazon Polly voice engine provides the most natural, expressive speech using generative AI?
- Standard Engine
- Neural Engine
- Long-form Engine
- Generative Engine
Answer: D
The Generative Engine is Polly's newest and most expressive voice engine, powered by generative AI. It produces the most natural and adaptive speech quality, ideal for high-quality conversational applications.
Q5. What is the key difference between Polly Lexicons and Amazon Translate Custom Terminology?
- Both control how words are translated between languages
- Lexicons control pronunciation; Custom Terminology controls translation mapping
- Both are the same feature in different services
- Lexicons are for batch processing; Custom Terminology is for real-time
Answer: B
Polly Lexicons control how specific text strings are PRONOUNCED in speech output. Amazon Translate Custom Terminology controls how specific terms are TRANSLATED between languages. Different purposes for different services.
Amazon Rekognition
What is Amazon Rekognition?
Amazon Rekognition is a fully managed computer vision service that uses machine learning to analyze images and videos -- detecting objects, people, text, scenes, faces, and inappropriate content.
Core Capabilities:
| Capability | What It Detects | Use Case |
|---|---|---|
| Label Detection | Objects, scenes, activities | 'Person', 'Car', 'Skateboard', 'Outdoors' with confidence scores |
| Facial Analysis | Age range, gender, emotions, facial attributes | Emotion detection, attendance tracking |
| Face Comparison | How similar two faces are | Verifying if two photos are the same person |
| Face Search | Match a face against a database of known faces | Celebrity recognition, user verification |
| Text in Image (OCR) | Detect and extract text from images | Read signs, license plates, on-screen text |
| Content Moderation | Unsafe/inappropriate content | Filter adult, violent, or offensive content |
| Celebrity Recognition | Identify well-known public figures | Media archiving, content tagging |
| Face Liveness | Verify a face is a real, live person (not a photo) | Fraud prevention in identity verification |
| Personal Protective Equipment (PPE) Detection | Detect face masks, hard hats, safety vests | Workplace safety compliance |
| Pathing | Track movement of people or objects across frames | Sports analytics, retail foot traffic |
Custom Labels:
Extend Rekognition to recognize YOUR OWN objects, logos, or products -- not just generic categories.
- Provide labeled training images (only a few hundred needed)
- Upload to Amazon S3 -> Rekognition trains a custom model
- After training, new images are analyzed for YOUR specific objects/logos
- Use case: The NFL detects its logos in social media photos
Content Moderation in Detail:
- Automatically flags inappropriate, unsafe, or offensive content in images/videos
- Reduces human review burden to ~1-5% of total content volume (only ambiguous cases go to humans)
- Human review integration: use Amazon Augmented AI (A2I) for final decisions on flagged content
- Custom Moderation Adapter: extend default moderation by training Rekognition with your own labeled examples for domain-specific moderation rules
- Human-reviewed results can feed BACK into Rekognition training to continuously improve accuracy
Content Moderation API Flow (Exam Scenario):
User request -> Chatbot generates image -> Send to Rekognition DetectModerationLabels API -> If SAFE -> return image to user If UNSAFE -> block/flag image
Supported Input Formats:
Images: JPEG, PNG
Videos: via S3 bucket (for batch) or Kinesis Video Streams (for real-time)
Key Terms
| Term | Definition |
|---|---|
| Amazon Rekognition | A fully managed computer vision service for analyzing images and videos. Detects objects, faces, text, scenes, unsafe content, celebrities, and more using pre-trained ML models. |
| Custom Labels (Rekognition) | A feature allowing users to train Rekognition to recognize custom objects, logos, or products by providing labeled training images. Used when default labels don't cover business-specific needs. |
| Content Moderation (Rekognition) | Rekognition's ability to automatically detect inappropriate, offensive, or unsafe content in images and videos. Reduces human review volume to 1-5% of content. |
| Face Liveness (Rekognition) | A Rekognition feature that verifies whether the face in front of a camera is a real, live person -- not a photo or spoofed identity. Used in identity verification workflows. |
| DetectModerationLabels API | The Amazon Rekognition API that analyzes an image and returns labels indicating whether the content is safe or contains inappropriate material -- enabling automated content moderation in applications. |
| Amazon Augmented AI (A2I) | An AWS service that routes low-confidence ML predictions (from Rekognition, Textract, or custom models) to human reviewers for manual validation. Integrates with Mechanical Turk or private workforce. |
- Rekognition = images and videos. Know ALL the capabilities: labels, faces, text, moderation, celebrities, liveness, PPE.
- Custom Labels = train Rekognition on YOUR OWN objects/logos with just a few hundred images.
- Content Moderation -> Rekognition reduces human review to 1-5%. A2I handles the remaining ambiguous cases.
- Face Liveness = verify a REAL person (not a spoofed photo). Used in identity verification.
- DetectModerationLabels API = the specific API for content moderation in Rekognition.
- Rekognition text-in-image is DIFFERENT from Textract -- Rekognition is for detecting text in images; Textract is for structured document extraction.
Practice Questions
Q1. A social media company wants to automatically detect and remove explicit content from user-uploaded images before they are published. They need to minimize human review costs while maintaining safety standards. Which combination of services is BEST?
- Amazon Comprehend for sentiment analysis + Amazon Translate for content classification
- Amazon Rekognition Content Moderation to auto-flag content + Amazon A2I for human review of ambiguous cases
- Amazon Textract to extract image text + Amazon Comprehend to classify the text as inappropriate
- Amazon Lex with content filtering rules + Amazon Polly for content narration
Answer: B
Amazon Rekognition Content Moderation automatically detects inappropriate content, handling ~95-99% of cases automatically and reducing human review to just 1-5% of ambiguous cases. Amazon Augmented AI (A2I) then routes those ambiguous cases to human reviewers for final decisions.
Q2. A company wants Amazon Rekognition to identify when its company logo appears in social media photos. The default label detection does not recognize the logo. What should they do?
- Use Rekognition's Celebrity Recognition feature -- logos are treated like public figures
- Use Amazon Comprehend custom entities to define the logo as a text-based entity
- Use Rekognition Custom Labels -- train the model with labeled images containing the logo
- Use Amazon Textract to extract the logo text from photos
Answer: C
Rekognition Custom Labels allows companies to train Rekognition on their own specific objects, logos, and products by providing labeled training images. After training on images containing the logo, Rekognition can reliably detect that specific logo in new images.
Q3. A banking app needs to verify that the person applying for an account is a real live person and not a photo of someone else. Which Rekognition feature addresses this?
- Face Comparison
- Face Liveness
- Content Moderation
- Celebrity Recognition
Answer: B
Face Liveness verifies that the face in front of a camera is a real, live person -- not a photo or spoofed identity. This prevents fraud in identity verification workflows by detecting presentation attacks.
Q4. A construction company needs to verify that workers are wearing required safety equipment (hard hats, vests) in photos from job sites. Which Rekognition capability handles this?
- Label Detection
- Content Moderation
- Personal Protective Equipment (PPE) Detection
- Custom Labels
Answer: C
PPE Detection is a built-in Rekognition capability that detects personal protective equipment including face masks, hard hats, and safety vests in images, enabling automated workplace safety compliance monitoring.
Q5. What percentage of content does Amazon Rekognition Content Moderation typically process automatically, leaving only ambiguous cases for human review?
- 50%
- 75%
- 95-99%
- 100%
Answer: C
Rekognition Content Moderation automatically handles approximately 95-99% of content, reducing human review burden to just 1-5% of ambiguous cases. This dramatically reduces moderation costs while maintaining safety standards.
Q6. An image in a photo album contains text showing a street address. Which AWS service can detect and extract this text from the image?
- Amazon Comprehend
- Amazon Rekognition Text in Image
- Amazon Translate
- Amazon Polly
Answer: B
Rekognition Text in Image (OCR) detects and extracts text visible in photos and real-world scenes, such as signs, labels, license plates, and on-screen text. This is different from Textract, which extracts structured data from documents.
Amazon Lex
What is Amazon Lex?
Amazon Lex is a fully managed service for building conversational AI chatbots that can interact with users via voice or text. It is the same technology that powers Amazon Alexa.
Core Concepts:
| Concept | Definition | Example |
|---|---|---|
| Intent | What the user wants to accomplish | 'Book a hotel', 'Order pizza', 'Check account balance' |
| Utterance | A phrase that triggers an intent | 'I want to book a hotel' or 'Reserve a room' -> triggers BookHotel intent |
| Slot | An input parameter needed to fulfill the intent | City, check-in date, number of nights, room type for a hotel booking |
| Fulfillment | The action taken when all slots are filled | Invoke an AWS Lambda function to execute the booking |
How a Lex Bot Works:
- User says/types an utterance (e.g., 'Book a hotel for 3 nights in Paris')
- Lex identifies the intent (BookHotel)
- Lex checks which slots are filled (Paris = city [check], 3 nights [check]) and which are missing (check-in date [x])
- Lex asks for missing slots ('What day do you want to check in?')
- Once all slots are filled, Lex calls the configured AWS Lambda function
- Lambda performs the actual booking in the backend system
- Lex returns the confirmation message to the user
Two Creation Methods:
- Traditional Bot -- manually configure intents, utterances, slots, and fulfillment
- Generative AI Bot -- describe the bot in natural language; Bedrock generates the configuration automatically
Key Integrations:
- AWS Lambda -- fulfillment actions (execute bookings, database lookups)
- Amazon Connect -- deploy Lex bots in call center phone systems
- Amazon Comprehend -- add NLP understanding to classify customer intent
- Amazon Kendra -- connect to a document knowledge base for Q&A
Use Cases:
- Customer service chatbots
- Booking and reservation systems (hotels, flights, restaurants)
- IT help desk bots
- FAQ bots for websites
- Phone-based IVR (Interactive Voice Response) systems
Supports: Multiple languages
Exam Tip:
Amazon Lex = chatbot/conversational AI builder. When the scenario asks about building a chatbot with intents and slot collection -> Lex.
Key Terms
| Term | Definition |
|---|---|
| Amazon Lex | A fully managed service for building conversational AI chatbots using voice or text. Uses the same technology as Amazon Alexa. |
| Intent (Lex) | The goal or action a user wants to accomplish in a Lex bot -- e.g., 'Book a hotel', 'Check balance'. Each bot can have multiple intents. |
| Utterance (Lex) | A sample phrase that triggers a specific intent. Lex uses these examples to learn which phrases map to which intents. |
| Slot (Lex) | An input parameter that a Lex bot collects from the user to fulfill an intent. For hotel booking: city, check-in date, number of nights, room type are all slots. |
| Fulfillment (Lex) | The action executed by Lex once all required slots are collected -- typically invoking an AWS Lambda function to perform the backend operation. |
- Lex = chatbot builder. Intent = what user wants. Utterance = how they say it. Slot = parameters needed. Lambda = fulfills the action.
- Lex bots collect SLOTS (parameters) through conversation -- asking follow-up questions until all required info is gathered.
- Lex -> Lambda = the standard fulfillment pattern. Lambda performs the actual business logic.
- Amazon Connect integration = Lex-powered phone bots for call centers.
- Generative AI bot creation in Lex uses Amazon Bedrock to auto-generate bot configuration from a natural language description.
Practice Questions
Q1. A hotel chain wants to build a chatbot that allows guests to make room reservations by providing their preferred city, check-in date, number of nights, and room type via text or voice. Which service and concepts are MOST relevant?
- Amazon Kendra with intent recognition and document search
- Amazon Lex with intents, slots, and Lambda fulfillment
- Amazon Comprehend with custom classification for reservation requests
- Amazon Personalize with a hotel booking recipe
Answer: B
Amazon Lex is designed exactly for this use case. The booking intent is defined with slots (city, check-in date, nights, room type). Lex converses with the user to collect all slot values, then invokes a Lambda function to execute the reservation in the hotel's booking system.
Q2. In Amazon Lex, what is a 'slot'?
- A time period when the bot is available
- An input parameter that the bot collects from the user to fulfill an intent
- A backup response when the bot doesn't understand
- A connection to a database
Answer: B
In Lex, a slot is an input parameter needed to fulfill an intent. For example, a hotel booking intent needs slots for city, check-in date, number of nights, and room type. Lex asks follow-up questions until all required slots are filled.
Q3. A company wants to deploy their Amazon Lex chatbot to their phone-based customer service system. Which AWS service integrates with Lex for this purpose?
- Amazon Polly
- Amazon Connect
- Amazon Transcribe
- Amazon SNS
Answer: B
Amazon Connect is AWS's cloud contact center service that integrates with Amazon Lex to deploy chatbots in phone-based IVR (Interactive Voice Response) systems, enabling voice-based conversational AI for customer service.
Q4. What happens in Amazon Lex after all required slots for an intent have been collected from the user?
- The conversation ends automatically
- Lex invokes the configured AWS Lambda function for fulfillment
- Lex sends an email notification
- The user must manually confirm completion
Answer: B
Once all required slots are filled, Lex triggers fulfillment by invoking the configured AWS Lambda function. Lambda performs the actual business logic (database lookup, booking, API call) and returns a confirmation message to the user.
Q5. Amazon Lex now offers a way to create bots using natural language descriptions. What technology powers this capability?
- Amazon Comprehend
- Amazon SageMaker
- Amazon Bedrock (Generative AI)
- Amazon Translate
Answer: C
Generative AI bot creation in Amazon Lex uses Amazon Bedrock to automatically generate bot configuration (intents, utterances, slots) from a natural language description, simplifying bot development.
Amazon Personalize
What is Amazon Personalize?
Amazon Personalize is a fully managed machine learning service that enables developers to build applications with real-time, personalized recommendations -- the same technology powering Amazon.com product recommendations.
Key Positioning:
- No need to build, train, or deploy ML models from scratch
- Takes days, not months, to implement
- Integrates with S3 for batch data ingestion and real-time API for live data
- Exposes a customized recommendation API for your web/mobile apps
How It Works:
- Provide user interaction data, item catalog, and user metadata via S3 or real-time API
- Choose a Recipe (pre-built algorithm for a specific recommendation use case)
- Personalize trains and hosts the recommendation model
- Call the Personalize API from your application to get personalized recommendations in real-time
Recipes -- Pre-Built Algorithms:
Recipes are Personalize's name for pre-implemented ML algorithms, each designed for a specific recommendation scenario:
| Recipe Category | Recipe Name | Use Case |
|---|---|---|
| USER_PERSONALIZATION | User-Personalization-v2 | Recommend items for each specific user based on their history |
| PERSONALIZED_RANKING | Personalized-Ranking-v2 | Re-rank a list of items in order of relevance for a specific user |
| RELATED_ITEMS | Similar-Items | Recommend items similar to one the user is viewing ('customers also bought') |
| TRENDING | Trending-Now | Recommend currently trending or popular items |
| POPULARITY | Popularity-Count | Recommend the most popular items overall |
| NEXT_BEST_ACTION | Next-Best-Action | Recommend the next action or content a user should engage with |
| USER_SEGMENTATION | Item-Affinity | Group users into segments based on item preferences |
Critical Exam Rule: Recipes in Personalize are ALWAYS about RECOMMENDATIONS -- not forecasting, not classification, not any other task.
Delivery Channels:
- Website and mobile app APIs (real-time recommendations)
- Email and SMS marketing campaigns (batch personalization)
Use Cases:
- E-commerce: 'Customers who bought X also bought Y'
- Streaming: 'Recommended shows for you'
- News: 'Articles you might like'
- Retail: Personalized promotions and email campaigns
Key Terms
| Term | Definition |
|---|---|
| Amazon Personalize | A fully managed ML service for building real-time personalized recommendation systems -- the same technology behind Amazon.com's product recommendations. |
| Recipe (Personalize) | A pre-built ML algorithm in Amazon Personalize for a specific recommendation scenario. All recipes produce personalized recommendations -- not forecasts or classifications. |
| User-Personalization (Recipe) | A Personalize recipe that recommends items for each user based on their individual interaction history and preferences. |
| Personalized Ranking (Recipe) | A Personalize recipe that re-orders a provided list of items based on the relevance to a specific user's preferences. |
| Similar Items (Recipe) | A Personalize recipe that recommends items related to one the user is currently viewing -- 'customers who viewed this also viewed'. |
- Personalize = RECOMMENDATIONS ONLY. Not forecasting, not classification -- personalized recommendations.
- Recipes = pre-built algorithms. All Personalize recipes produce some form of recommendation.
- The exam may ask 'what service provides personalized product recommendations?' -> Amazon Personalize.
- Personalize is the SAME technology as Amazon.com's own recommendation engine -- key selling point.
- Personalize supports REAL-TIME API recommendations AND batch delivery via email/SMS campaigns.
Practice Questions
Q1. A streaming service wants to show each user a personalized list of movies ranked in order of likelihood they'll enjoy them, based on their viewing history. Which Amazon Personalize recipe is MOST appropriate?
- User-Personalization-v2 -- to recommend movies to each user from the full catalog
- Personalized-Ranking-v2 -- to re-rank a curated list of movies in order of relevance for each user
- Similar-Items -- to recommend movies similar to what the user just watched
- Trending-Now -- to recommend currently trending movies to all users
Answer: B
Personalized-Ranking-v2 re-orders a provided list of items in descending order of relevance for a specific user. When you have a curated list (e.g., this week's new releases) and want to personalize the order for each viewer, Personalized Ranking is the correct recipe.
Q2. An e-commerce site wants to show 'Customers who bought this also bought...' recommendations on product pages. Which Personalize recipe is MOST appropriate?
- User-Personalization-v2
- Personalized-Ranking-v2
- Similar-Items
- Trending-Now
Answer: C
Similar-Items recommends items related to one the user is currently viewing, perfect for 'customers who bought this also bought' or 'related products' scenarios on product detail pages.
Q3. What type of output does Amazon Personalize ALWAYS produce, regardless of which recipe is used?
- Forecasts
- Classifications
- Recommendations
- Translations
Answer: C
Amazon Personalize recipes ALWAYS produce personalized recommendations. It is not a forecasting service (that's Amazon Forecast) or a classification service (that's Comprehend). All Personalize recipes are recommendation-focused.
Q4. Amazon Personalize uses the same recommendation technology as which famous platform?
- Netflix
- Amazon.com
- Spotify
- YouTube
Answer: B
Amazon Personalize is built on the same ML technology that powers Amazon.com's product recommendations. This is a key selling point -- the same technology behind one of the world's most successful recommendation engines is available as a managed service.
Q5. A news website wants to recommend currently trending articles to all users (not personalized per user). Which Personalize recipe should they use?
- User-Personalization-v2
- Similar-Items
- Trending-Now
- Next-Best-Action
Answer: C
Trending-Now recommends currently trending or popular items across all users. This is ideal for highlighting what's popular right now, rather than personalized recommendations based on individual user history.
Amazon Textract
What is Amazon Textract?
Amazon Textract is a fully managed ML service that automatically extracts text, handwriting, forms, tables, and structured data from scanned documents and images -- going far beyond simple OCR.
What Makes Textract Different from Basic OCR:
Basic OCR reads raw characters. Textract understands STRUCTURE -- it knows the difference between a title, a table cell, a form field, and a value.
Extraction Capabilities:
| Capability | What It Extracts | Example |
|---|---|---|
| Raw Text | All text content from the document | Every word on a paystub |
| Layout | Document structure (title, section header, paragraph) | 'Earning Statements' identified as a title |
| Forms (Key-Value Pairs) | Field name + its associated value | 'Period Ending' -> '7/18/2008', 'SSN' -> '*--****' |
| Tables | Rows, columns, and cell values | Full earnings table with rates, hours, and totals |
| Queries | Natural language questions about document content | 'What is the year-to-date gross pay?' -> '$45,200' |
| Expense Analysis | Vendor info, line items, totals from receipts/invoices | Vendor: 'ABC Corp', Item: 'Laptop', Price: '$1,200' |
| ID Analysis | Standardized field extraction from government IDs | First name, last name, DOB, address, document number |
Supported Input Formats:
Images (JPEG, PNG), PDFs
Analysis Types:
- Synchronous (Real-time) -- for single pages/images
- Asynchronous (Batch) -- for multi-page PDFs via S3
Use Cases by Industry:
- Financial Services -- invoice processing, financial report extraction, loan document analysis
- Healthcare -- medical records, insurance claims, prescription extraction
- Public Sector -- tax forms, identity documents (passports, IDs), government applications
Textract vs. Rekognition Text-in-Image:
| Feature | Amazon Textract | Rekognition Text in Image |
|---|---|---|
| Purpose | Structured document extraction | Read text visible in photos/scenes |
| Context | Forms, tables, key-value pairs from documents | Signs, labels, numbers in photos |
| Best for | Scanned business documents | Text embedded in real-world images |
Key Terms
| Term | Definition |
|---|---|
| Amazon Textract | A fully managed ML service that extracts text, handwriting, forms, tables, and structured data from scanned documents and images -- far beyond basic character recognition. |
| Key-Value Pair Extraction (Textract) | Textract's ability to identify form fields and their corresponding values -- e.g., 'Pay Date' -> '2024-01-15', enabling structured data extraction from forms. |
| Table Extraction (Textract) | Textract's ability to detect and extract complete tables from documents, preserving the row/column structure of the data. |
| Queries (Textract) | A Textract feature that allows natural language questions about document content -- e.g., 'What is the total amount due?' -- and returns the specific extracted answer. |
- Textract = extract STRUCTURED data from documents (forms, tables, key-value pairs). Not just raw text.
- Textract vs. Rekognition: Textract = structured document extraction. Rekognition text = reading text in photos of real-world scenes.
- Textract supports: text, handwriting, forms, tables, queries, expense analysis, and ID document analysis.
- Use case: 'extract data from scanned invoices/medical records/IDs' -> Amazon Textract.
- Textract Queries = ask natural language questions about document content and get extracted answers.
Practice Questions
Q1. A bank processes thousands of loan application forms daily. Each form has standardized fields (applicant name, income, loan amount requested). They want to automatically extract all field-value pairs from scanned PDFs. Which service is MOST appropriate?
- Amazon Comprehend -- to extract entities like names and amounts from the text
- Amazon Rekognition -- to detect text in scanned form images
- Amazon Textract -- to extract key-value pairs from structured form documents
- Amazon Kendra -- to index and search loan application documents
Answer: C
Amazon Textract is specifically designed to extract key-value pairs from structured forms -- identifying both the field label ('Applicant Name') and its corresponding value ('John Smith'). This is more powerful than Comprehend's NER (which identifies entities in free text) or Rekognition's text detection (which reads raw characters in images).
Q2. What makes Amazon Textract different from basic OCR (Optical Character Recognition)?
- Textract only works with PDFs
- Textract understands document STRUCTURE -- forms, tables, key-value pairs -- not just raw characters
- Textract requires training on each document type
- Textract only extracts handwritten text
Answer: B
While basic OCR reads raw characters, Textract understands document STRUCTURE. It knows the difference between titles, table cells, form fields, and values -- enabling structured data extraction rather than just text reading.
Q3. A company wants to ask natural language questions about document content, like 'What is the total amount due?' and get direct answers. Which Textract feature enables this?
- Table Extraction
- Key-Value Pair Extraction
- Queries
- Layout Detection
Answer: C
Textract Queries allow you to ask natural language questions about document content and receive specific extracted answers. Instead of extracting all fields, you can ask targeted questions like 'What is the total amount due?' and get just that value.
Q4. An HR department needs to extract data from employee ID cards including name, date of birth, and document number. Which Textract capability handles this?
- Forms Extraction
- Table Extraction
- ID Analysis
- Expense Analysis
Answer: C
ID Analysis provides standardized field extraction from government-issued ID documents, extracting fields like first name, last name, date of birth, address, and document number in a structured format.
Q5. What is the difference between Amazon Textract and Amazon Rekognition Text-in-Image?
- They are the same feature
- Textract extracts structured data from documents; Rekognition reads text in real-world photos/scenes
- Textract is for audio; Rekognition is for images
- Rekognition extracts tables; Textract detects objects
Answer: B
Textract is designed for structured document extraction (forms, tables, invoices). Rekognition Text-in-Image reads text visible in real-world photos and scenes (signs, labels, license plates). Different tools for different contexts.
Amazon Kendra, Mechanical Turk, and Augmented AI (A2I)
Amazon Kendra -- Intelligent Document Search:
Amazon Kendra is a fully managed, ML-powered enterprise document search service. Unlike keyword search, Kendra understands the MEANING of questions and returns direct answers.
How It Works:
- Index documents from multiple sources: S3, SharePoint, OneDrive, Confluence, RDS, databases, FAQs, PDFs, HTML, Word, PowerPoint
- Build an internal ML-powered knowledge index
- Users ask natural language questions; Kendra returns direct answers -- not just a list of links
Key Features:
- Natural Language Queries: 'Where is the IT support desk?' -> Kendra returns 'First floor' (extracted from an HR document)
- Incremental Learning: Kendra learns from user feedback and clicks to improve future search results
- Fine-tuning: Customize relevance based on data freshness, document importance, or custom metadata filters
Kendra vs. Q Business:
Both provide document-based Q&A, but Q Business is an enterprise assistant with authentication, plugins, and admin controls. Kendra is specifically a search service that can be embedded in any application.
Exam Tip: 'ML-powered document search service' -> Amazon Kendra
---
Amazon Mechanical Turk -- Human Crowdsourcing:
Amazon Mechanical Turk (MTurk) is a crowdsourcing marketplace that provides access to a distributed global workforce to perform simple, scalable human tasks.
Named After: The 1770 chess-playing 'Mechanical Turk' -- an illusion of a machine playing chess, actually operated by a hidden human.
How It Works:
- Requesters publish small tasks (HITs -- Human Intelligence Tasks) with a reward per task
- Workers worldwide accept and complete tasks for the reward
- Results are aggregated and returned to the requester
Common Task Types:
- Image labeling / classification for ML training data
- Data collection and validation
- Sentiment annotation
- Content moderation review
- Business process tasks (data entry, form filling)
Why It Matters for AI/ML:
The primary use of MTurk in AI workflows is DATA LABELING -- creating the labeled training datasets that supervised ML algorithms require. Labeling millions of images by hand is impractical for one team; MTurk distributes this across thousands of workers.
Integrations:
- Amazon A2I -- for human review of ML predictions
- Amazon SageMaker Ground Truth -- for labeling workflows
---
Amazon Augmented AI (A2I) -- Human Review of ML Predictions:
Amazon A2I enables human oversight of machine learning predictions when confidence is low or when random auditing is required.
How It Works:
- ML model makes a prediction (from Rekognition, Textract, or a custom model)
- A2I evaluates confidence:
- High confidence -> prediction returned directly to the application
- Low confidence -> routed to human reviewers
- Humans review and correct the prediction
- Results are stored in S3
- Approved predictions can be fed back into the ML model to improve it over time
Supported Task Types:
- Image moderation (Rekognition)
- Document key-value extraction (Textract)
- Custom ML model predictions
Who Reviews? Three workforce options:
- Private Workforce -- your own employees (for confidential/sensitive data)
- Amazon Mechanical Turk -- 500,000+ independent contractors (for general tasks)
- Vendor Workforce -- pre-screened third-party vendors from AWS Marketplace (for specialized/confidential tasks)
Exam Pattern:
'Human review of ML predictions when confidence is low' -> Amazon Augmented AI (A2I)
'Crowdsourced human labeling of data' -> Amazon Mechanical Turk
'ML-powered document search' -> Amazon Kendra
Key Terms
| Term | Definition |
|---|---|
| Amazon Kendra | A fully managed ML-powered enterprise document search service that understands natural language questions and returns direct answers from indexed documents. |
| Incremental Learning (Kendra) | Kendra's ability to improve search result relevance over time by learning from user interaction feedback (clicks, ratings, and query behavior). |
| Amazon Mechanical Turk | A crowdsourcing marketplace that provides access to a global human workforce for simple, scalable tasks -- primarily used for data labeling, annotation, and content review in AI/ML workflows. |
| HIT (Human Intelligence Task) | A discrete unit of work posted on Amazon Mechanical Turk -- e.g., 'label this image as cat or dog'. Workers complete HITs for a defined reward. |
| Amazon Augmented AI (A2I) | An AWS service that adds human review to ML prediction workflows. Routes low-confidence predictions to human reviewers (employees, MTurk workers, or vendor workforce) for validation. |
| Private Workforce (A2I) | A company's own employees used as human reviewers in an A2I workflow -- the appropriate choice when predictions contain sensitive or confidential data. |
- Kendra = DOCUMENT SEARCH with natural language. Returns direct answers, not just links.
- Mechanical Turk = CROWDSOURCED HUMAN TASKS. Primary AI use case = data labeling for ML training.
- A2I = HUMAN REVIEW of ML PREDICTIONS when confidence is low. Not for training data -- for reviewing predictions.
- A2I workforce options: Private (employees) -> confidential data. MTurk -> general tasks. Vendors -> specialized tasks.
- The feedback loop: A2I reviews -> results feed BACK into the model -> model improves over time.
- Kendra vs. Q Business: Kendra = search service for embedding in apps. Q Business = full enterprise AI assistant.
Practice Questions
Q1. A company's ML model that classifies incoming insurance claims sometimes returns low-confidence predictions. They want a workflow where a human agent reviews all low-confidence claims before they are processed. Which service implements this?
- Amazon Mechanical Turk -- to crowdsource claim review to external workers
- Amazon Augmented AI (A2I) -- to route low-confidence predictions to human reviewers
- Amazon Comprehend -- to re-classify claims with higher confidence
- Amazon Kendra -- to search for similar claims in the archive
Answer: B
Amazon Augmented AI (A2I) is specifically designed to add human review to ML prediction workflows. It routes low-confidence predictions to a designated workforce (employees, MTurk, or vendors) for human validation, ensuring accuracy before business-critical decisions are made.
Q2. A company needs to build a large labeled dataset for training a custom ML model. They need thousands of images labeled by humans. Which AWS service provides access to a crowdsourced workforce for this task?
- Amazon A2I
- Amazon Kendra
- Amazon Mechanical Turk
- Amazon Comprehend
Answer: C
Amazon Mechanical Turk (MTurk) is a crowdsourcing marketplace providing access to a global workforce for simple, scalable human tasks. Its primary use in AI/ML is data labeling -- creating labeled training datasets that supervised ML algorithms require.
Q3. An enterprise wants employees to find answers to HR policy questions by asking natural language questions like 'How many vacation days do I get?' Which service provides ML-powered document search with direct answers?
- Amazon Comprehend
- Amazon Kendra
- Amazon Lex
- Amazon Translate
Answer: B
Amazon Kendra is an ML-powered enterprise document search service that understands natural language questions and returns direct answers (not just links) from indexed documents. Perfect for internal knowledge bases and HR policy queries.
Q4. A company processes medical records and needs human review of extracted data, but the data is confidential and cannot be shared with external workers. Which A2I workforce option should they use?
- Amazon Mechanical Turk workforce
- Vendor workforce from AWS Marketplace
- Private workforce (company employees)
- No workforce -- A2I doesn't support confidential data
Answer: C
A2I supports three workforce options. For confidential or sensitive data, the Private Workforce option uses your own employees as reviewers, ensuring data stays within the organization and maintaining compliance with privacy requirements.
Q5. How does Amazon Kendra improve search accuracy over time?
- By requiring manual tuning after each query
- Through incremental learning from user feedback and clicks
- By reindexing all documents daily
- It cannot improve -- accuracy is fixed at deployment
Answer: B
Kendra uses incremental learning to improve search result relevance over time by learning from user interaction feedback (clicks, ratings, query behavior). The more users interact with search results, the better Kendra becomes at understanding what they're looking for.
Medical AI Services (Transcribe Medical, Comprehend Medical, HealthScribe)
Overview:
AWS has healthcare-specific versions of Transcribe and Comprehend, plus a dedicated clinical documentation service. All are HIPAA-eligible -- they can be used in regulated healthcare environments.
---
Amazon Transcribe Medical:
A specialized version of Transcribe designed for medical speech recognition.
- Converts clinical audio (doctor dictations, patient calls, clinical discussions) into text
- Specialized vocabulary: medical terminology, drug names, procedures, disease names, body parts
- HIPAA-eligible for use in regulated healthcare environments
- Modes: real-time (microphone) and batch (file upload from S3)
Use Cases:
- Physicians dictating medical notes into an EHR (Electronic Health Record) system
- Transcribing drug safety call center recordings
- Converting clinical trial interview audio to text
---
Amazon Comprehend Medical:
A specialized NLP service for understanding and extracting information from clinical text.
- Understands unstructured medical notes, discharge summaries, test results, and case notes
- Extracts medical entities and their relationships:
- Medications (name, dosage, frequency, route)
- Medical conditions and diagnoses
- Anatomical terms
- Test results and lab values
- Detects PHI (Protected Health Information) -- more specific than general PII, covering HIPAA-regulated data
- Works with S3 batch input, Kinesis Data Firehose (real-time streaming), or direct API
Comprehend Medical Relationship Extraction:
Goes beyond entity detection to understand the RELATIONSHIP between terms -- e.g., linking a dosage and frequency to a specific drug name. This enables truly structured medical data from free-form clinical notes.
Common Workflow:
Clinical audio -> Amazon Transcribe Medical -> Text Text -> Amazon Comprehend Medical -> Structured medical entities, relationships, and PHI
---
AWS HealthScribe:
A HIPAA-eligible service that automatically generates clinical documentation from patient-clinician conversations.
What It Does in One Step (from audio):
- Creates rich verbatim transcripts
- Identifies who is speaking (speaker role identification -- clinician vs. patient)
- Classifies dialogue segments (symptom description, medical history, assessment, plan)
- Extracts medical terms and concepts
- Generates structured clinical notes automatically
Output Sections:
- Chief complaint
- Medical history
- Assessment and diagnosis
- Treatment plan
Key Benefit:
Reduces physician documentation burden -- doctors spend less time typing notes and more time with patients. All documentation is auto-generated from the conversation recording.
Currently Available In: Select AWS Regions (accessed within the Amazon Transcribe console)
Service Comparison:
| Service | Input | Output | Specialty |
|---|---|---|---|
| Transcribe Medical | Audio | Raw medical text | Medical ASR |
| Comprehend Medical | Medical text | Structured entities, PHI | Medical NLP |
| AWS HealthScribe | Clinical conversation audio | Full clinical notes + transcript | End-to-end clinical documentation |
Key Terms
| Term | Definition |
|---|---|
| Amazon Transcribe Medical | A HIPAA-eligible version of Transcribe specialized for medical speech recognition -- converting clinical audio to text with deep understanding of medical terminology. |
| Amazon Comprehend Medical | A HIPAA-eligible NLP service that extracts medical entities (medications, conditions, procedures), their relationships, and PHI from clinical text. |
| PHI (Protected Health Information) | Health information that identifies a patient and is regulated under HIPAA. Comprehend Medical detects PHI automatically, enabling compliant data handling in healthcare workflows. |
| AWS HealthScribe | A HIPAA-eligible end-to-end service that generates structured clinical notes, transcripts, and medical summaries automatically from recorded patient-clinician conversations. |
| Speaker Role Identification (HealthScribe) | HealthScribe's ability to distinguish between the clinician and the patient in a recorded conversation, enabling properly attributed clinical documentation. |
- Transcribe Medical = SPEECH to TEXT for medical audio. HIPAA-eligible.
- Comprehend Medical = NLP for medical TEXT. Extracts drugs, conditions, dosages, relationships, and PHI.
- HealthScribe = END-TO-END clinical documentation from conversation audio. Generates full clinical notes.
- PHI != PII -- PHI is HIPAA-specific health information. Comprehend Medical detects PHI. Standard Comprehend detects general PII.
- The pipeline: clinical audio -> Transcribe Medical -> text -> Comprehend Medical -> structured data.
- All three services are HIPAA-eligible -- usable in regulated healthcare environments.
Practice Questions
Q1. A hospital wants to automatically create structured clinical notes from patient appointment recordings, including who said what, symptoms discussed, and the treatment plan -- all generated without the physician manually typing anything. Which AWS service is MOST appropriate?
- Amazon Transcribe Medical -- to convert the audio to a verbatim transcript
- Amazon Comprehend Medical -- to extract medical entities from the transcript
- AWS HealthScribe -- to generate full structured clinical notes from the patient-clinician conversation
- AWS Mechanical Turk -- to have human transcriptionists review and structure the notes
Answer: C
AWS HealthScribe is specifically designed to generate complete structured clinical documentation from patient-clinician conversation audio in a single step -- producing transcripts with speaker attribution, medical entity extraction, and formatted clinical notes. Transcribe Medical and Comprehend Medical would need to be combined and require additional processing steps.
Q2. A pharmaceutical company needs to extract medication names, dosages, and frequencies from unstructured clinical trial notes. Which service is MOST appropriate?
- Amazon Comprehend
- Amazon Comprehend Medical
- Amazon Textract
- Amazon Rekognition
Answer: B
Amazon Comprehend Medical is specialized for medical NLP, extracting medical entities including medications (name, dosage, frequency, route), conditions, and procedures from clinical text. Standard Comprehend doesn't have this medical domain expertise.
Q3. What is the difference between PII (detected by Comprehend) and PHI (detected by Comprehend Medical)?
- They are the same thing
- PII is general personal info; PHI is HIPAA-specific health information
- PHI is for photos; PII is for text
- PII is for US data; PHI is for EU data
Answer: B
PII (Personally Identifiable Information) is general personal data like names and addresses. PHI (Protected Health Information) is specifically defined under HIPAA and includes health-related data. Comprehend Medical detects PHI, which is more specific than general PII.
Q4. What is the typical workflow for processing clinical audio into structured medical data?
- Amazon Polly -> Amazon Translate
- Amazon Transcribe Medical -> Amazon Comprehend Medical
- Amazon Rekognition -> Amazon Textract
- Amazon Lex -> Amazon Personalize
Answer: B
The standard medical audio processing pipeline is: Amazon Transcribe Medical (converts clinical audio to text) -> Amazon Comprehend Medical (extracts structured medical entities, relationships, and PHI from the text).
Q5. Are Amazon Transcribe Medical, Comprehend Medical, and HealthScribe suitable for use in regulated healthcare environments?
- No -- they cannot handle sensitive health data
- Yes -- all three are HIPAA-eligible services
- Only Comprehend Medical is HIPAA-eligible
- Only when used with Amazon Macie
Answer: B
All three services -- Amazon Transcribe Medical, Amazon Comprehend Medical, and AWS HealthScribe -- are HIPAA-eligible, meaning they can be used in regulated healthcare environments when properly configured as part of a HIPAA-compliant architecture.
Amazon EC2 for AI -- Trainium and Inferentia
EC2 Overview (AI Perspective):
Amazon EC2 (Elastic Compute Cloud) provides virtual servers in the cloud. While most AWS AI services are fully managed, some organizations build and train their own large models directly on EC2 instances. For this, AWS has created specialized ML hardware.
Standard GPU-Based EC2 Instances:
For general ML workloads, GPU instances provide the parallel compute needed for deep learning.
| Family | Best For |
|---|---|
| P3, P4, P5 | High-performance ML training (NVIDIA GPUs) |
| G3, G4, G5, G6 | ML inference and some training (NVIDIA GPUs) |
---
AWS Trainium -- Custom ML Training Chips:
AWS-designed ML chips built specifically for training large deep learning models.
- Optimized for training models with 100 billion+ parameters
- EC2 instance type: Trn1 (e.g., trn1.32xlarge has 16 Trainium accelerators)
- Advertised benefit: up to 50% cost reduction vs. comparable GPU-based training instances
- Use when: training very large custom models directly on EC2 (not using SageMaker)
---
AWS Inferentia -- Custom ML Inference Chips:
AWS-designed chips optimized for running ML inference (predictions) at scale and low cost.
- EC2 instance types: Inf1, Inf2
- Benefit: up to 4x higher throughput than comparable GPU instances
- Benefit: up to 70% cost reduction vs. GPU-based inference instances
- Use when: serving a trained model at high volume and need cost-efficient, fast inference
---
Trainium vs. Inferentia:
| Chip | EC2 Type | Use Case | Key Benefit |
|---|---|---|---|
| AWS Trainium | Trn1 | TRAINING deep learning models | ~50% cost reduction vs. GPU training |
| AWS Inferentia | Inf1, Inf2 | INFERENCE (predictions) from trained models | ~4x throughput, ~70% cost reduction vs. GPU inference |
Environmental Note (Exam):
Trainium and Inferentia instances have the LOWEST environmental footprint of all ML compute options on AWS -- because they are the most energy-efficient per unit of ML work performed.
Decision Framework:
Using a managed service (Bedrock, Rekognition, etc.)? -> No EC2 needed. Training a large custom model? -> Trn1 (Trainium) or P-family GPU instances. Serving a trained model at scale? -> Inf1/Inf2 (Inferentia) or G-family GPU instances. Need lowest cost? -> Trainium for training, Inferentia for inference.
Key Terms
| Term | Definition |
|---|---|
| AWS Trainium | AWS-designed ML accelerator chips for training large deep learning models (100B+ parameters). Available on Trn1 EC2 instances with up to 50% cost savings vs. GPU training instances. |
| AWS Inferentia | AWS-designed ML inference chips for running trained models at scale. Available on Inf1 and Inf2 EC2 instances with up to 4x throughput and 70% cost savings vs. GPU inference instances. |
| Trn1 Instance | An EC2 instance type powered by AWS Trainium chips, optimized for training large-scale deep learning models at lower cost than GPU instances. |
| Inf1 / Inf2 Instance | EC2 instance types powered by AWS Inferentia chips, optimized for cost-efficient, high-throughput ML inference at scale. |
- Trainium = TRAINING. Inferentia = INFERENCE. The names hint at their purpose.
- Trn1 = Trainium instances. Inf1/Inf2 = Inferentia instances. Know the instance type naming.
- Trainium = 50% cost savings vs. GPU training. Inferentia = 4x throughput + 70% cost savings vs. GPU inference.
- Both Trainium and Inferentia have the LOWEST ENVIRONMENTAL FOOTPRINT of AWS ML hardware.
- P-family EC2 = GPU-based training (NVIDIA). G-family = GPU-based inference. Trn1 = Trainium. Inf1/Inf2 = Inferentia.
- If exam asks about cost-optimized ML hardware on EC2 -> Trainium (training) or Inferentia (inference).
Practice Questions
Q1. A research team is training a 200-billion parameter language model directly on Amazon EC2. They need to minimize training costs while maintaining high performance. Which EC2 instance family should they use?
- G5 instances -- for GPU-accelerated training with NVIDIA A10G GPUs
- Inf2 instances -- for cost-efficient inference of large language models
- Trn1 instances -- powered by AWS Trainium chips, offering up to 50% cost savings for large model training
- P4 instances -- for highest absolute training performance with NVIDIA A100 GPUs
Answer: C
AWS Trainium chips (Trn1 instances) are specifically optimized for training large deep learning models at up to 50% lower cost than comparable GPU instances. For a 200B parameter model training workload where cost optimization is the priority, Trn1 is the correct choice.
Q2. A company has a trained recommendation model they need to serve to 10 million users daily with very low latency and the lowest possible cost per inference. Which EC2 instance type is MOST appropriate?
- Trn1 -- for high-throughput model training
- Inf2 -- powered by AWS Inferentia, providing up to 4x throughput and 70% cost savings for inference
- P5 -- for maximum GPU compute in inference workloads
- G6 -- for balanced training and inference workloads
Answer: B
AWS Inferentia chips (Inf1/Inf2 instances) are optimized for ML inference at scale -- delivering up to 4x the throughput and 70% cost savings compared to GPU-based inference instances. For high-volume, cost-sensitive inference workloads, Inferentia is the purpose-built solution.
Q3. A company wants to minimize their environmental impact when running ML workloads on AWS. Which hardware option has the lowest environmental footprint?
- Standard EC2 instances
- P5 GPU instances
- AWS Trainium and Inferentia chips
- G6 instances
Answer: C
AWS Trainium and Inferentia have the lowest environmental footprint of all ML compute options on AWS because they are the most energy-efficient per unit of ML work performed. They're purpose-built for ML, unlike general-purpose GPUs.
Q4. What is the primary difference between AWS Trainium and AWS Inferentia?
- Trainium is for images; Inferentia is for text
- Trainium is for TRAINING large models; Inferentia is for INFERENCE (predictions)
- They are the same chip with different names
- Trainium is older; Inferentia is newer
Answer: B
AWS Trainium is designed for TRAINING large deep learning models (100B+ parameters) with up to 50% cost savings. AWS Inferentia is designed for running INFERENCE at scale with up to 4x throughput and 70% cost savings. Different chips for different phases of the ML lifecycle.
Q5. Which EC2 instance type uses AWS Trainium chips?
- P4
- G5
- Trn1
- Inf2
Answer: C
Trn1 instances are powered by AWS Trainium chips, optimized for training large-scale deep learning models. The naming convention is: Trn = Trainium (training), Inf = Inferentia (inference).
Q6. When would you choose P5 GPU instances over Trn1 Trainium instances for ML training?
- When cost optimization is the top priority
- When you need maximum absolute performance and compatibility with NVIDIA CUDA libraries
- When environmental impact is the main concern
- P5 is always better than Trn1
Answer: B
P5 instances with NVIDIA GPUs provide maximum training performance and full compatibility with CUDA libraries and existing GPU-optimized code. Trn1 offers better cost efficiency but may require code adaptation. Choose based on whether cost or compatibility is your priority.
AWS AI Practitioner - Table of Contents
Master all exam topics with comprehensive study guides and practice questions.