AWS AI Practitioner - Amazon Bedrock and Generative AI (GenAI)

What is Generative AI (GenAI)?

The AI Hierarchy:

Generative AI sits within a broader hierarchy of intelligence systems:

Artificial Intelligence (AI) - the broadest category; machines simulating human-like thinking
Machine Learning (ML) - a subset of AI; systems that learn from data without being explicitly programmed
Deep Learning - a subset of ML; uses multi-layered neural networks to find patterns in large datasets
Generative AI - a subset of deep learning; models that generate NEW data similar to what they were trained on

ASCII DIAGRAM: AI/ML/DL/GenAI Hierarchy Pyramid

                    +===========================================+
                    |     ARTIFICIAL INTELLIGENCE (AI)          |
                    |   Broadest: Machines simulating thinking  |
                    +===========================================+
                    |                    |                      |
                    |                    v                      |
                    |    +---------------------------------+    |
                    |    |    MACHINE LEARNING (ML)        |    |
                    |    |  Learning from data patterns    |    |
                    |    +---------------------------------+    |
                    |                    |                      |
                    |                    v                      |
                    |       +---------------------------+       |
                    |       |    DEEP LEARNING (DL)     |       |
                    |       |   Neural Networks/Layers  |       |
                    |       +---------------------------+       |
                    |                    |                      |
                    |                    v                      |
                    |          +-------------------+            |
                    |          |  GENERATIVE AI    |            |
                    |          |  Creates NEW data |            |
                    |          +-------------------+            |
                    +===========================================+

     Each inner box is a SUBSET of the outer boxes.
     GenAI is the most specialized category.

What Can GenAI Generate?

Models can be trained on and generate virtually any data type: text, images, audio, video, code, and more.

Foundation Models (FMs):

The backbone of modern GenAI. A Foundation Model is a large, general-purpose model trained on massive amounts of unlabeled data that can be adapted to many different tasks.

Cost tens of millions of dollars to train
Require enormous computational resources and time
Only a handful of large companies build them from scratch
Can perform: text generation, summarization, information extraction, image generation, Q&A, and more

Who Builds Foundation Models?

OpenAI (GPT-4o -- powers ChatGPT)
Anthropic (Claude)
Amazon (Titan, Nova)
Meta (Llama -- open source)
Google (BERT, Gemini)
Mistral AI, Cohere, Stability AI, and more

Some models are open source (free to use), others require commercial licensing.

Large Language Models (LLMs):

A specific type of Foundation Model designed to understand and generate human-like text.

Trained on billions of words from books, websites, articles, and more
Respond to a natural language input called a prompt
Can translate, summarize, answer questions, write code, and create content
Output is non-deterministic -- the same prompt can produce different results each time

Why is GenAI Output Non-Deterministic?

LLMs generate text word-by-word (token-by-token) using statistical probabilities, not fixed rules. For each position, the model assigns probabilities to possible next words and randomly samples from them. Since this process is probabilistic, the same prompt yields slightly different outputs each time.

GenAI for Images -- Diffusion Models:

One popular approach for image generation:

Forward diffusion - training phase where noise is progressively added to images until they become pure noise
Reverse diffusion - generation phase where the model starts from random noise and removes it step-by-step, guided by a text prompt, to produce a new image

This is the mechanism behind models like Stable Diffusion.

Types of Machine Learning:

Supervised Learning - learns from labeled data (input + correct output). Example: spam detection
Unsupervised Learning - finds patterns in unlabeled data. Example: customer segmentation
Reinforcement Learning - learns through trial and error with rewards/penalties. Example: game-playing AI
Self-Supervised Learning - creates its own labels from data. Foundation models use this approach

Transformer Architecture:

The neural network architecture behind modern LLMs. Key innovation: the 'attention mechanism' that allows the model to weigh the importance of different words in a sentence when generating output.

Key Terms

Term	Definition
Generative AI	A category of AI models that generate new content (text, images, audio, etc.) that is statistically similar to the data they were trained on.
Foundation Model (FM)	A large, general-purpose AI model trained on vast unlabeled data that can be adapted to many downstream tasks. Expensive to train; only a few companies build them.
Large Language Model (LLM)	A type of Foundation Model specifically designed to understand and generate coherent human-like text using probabilistic token prediction.
Prompt	The input text you send to a GenAI model -- can be a question, instruction, or context. The model's response is shaped by the prompt.
Non-Deterministic Output	The property of LLMs where the same prompt can produce different results each time, because word selection is probability-based, not rule-based.
Diffusion Model	An image generation technique that trains by adding noise to images and then learns to reverse that process to generate new images from noise guided by a prompt.
Token	The basic unit of text an LLM processes. A token is roughly a word or word fragment. Models are billed and limited by token count.
Transformer	The neural network architecture behind modern LLMs. Uses attention mechanisms to process sequences of data in parallel and understand context across long text spans.
Attention Mechanism	A technique in transformers that allows the model to focus on different parts of the input when generating each part of the output, enabling understanding of context and relationships.
Self-Supervised Learning	A machine learning approach where the model generates its own training labels from the structure of the data itself. Foundation models are trained using this technique.
Neural Network	A computing system inspired by biological brains, consisting of interconnected nodes (neurons) organized in layers that process information and learn patterns from data.
Pre-Training	The initial phase where a Foundation Model is trained on massive datasets to learn general language patterns before being adapted for specific tasks.

Exam Tips:

GenAI is a SUBSET of deep learning, which is a subset of ML, which is a subset of AI. Know the hierarchy.
Foundation Models are trained ONCE on massive data and then reused/adapted -- they are NOT retrained per user.
Non-deterministic = same input, DIFFERENT output. This is by design, not a bug.
LLMs generate text token by token using PROBABILITIES -- this is why output varies.
The exam may ask what type of model is used for image generation -- think diffusion models (Stable Diffusion).
The TRANSFORMER architecture is the foundation of modern LLMs -- know this term for the exam.
Foundation Models use SELF-SUPERVISED learning -- they create their own labels during pre-training.
Pre-training is EXPENSIVE (millions of dollars) -- this is why only large companies build FMs from scratch.
LLMs are ONE TYPE of Foundation Model -- not all FMs are LLMs (some are image models, multimodal, etc.).
The ATTENTION mechanism is what makes transformers powerful -- it allows understanding context across long sequences.

Practice Questions

Q1. Which of the following correctly describes the relationship between AI, Machine Learning, Deep Learning, and Generative AI?

Generative AI contains Deep Learning, which contains Machine Learning, which contains AI
AI contains Machine Learning, which contains Deep Learning, which contains Generative AI
Machine Learning and Generative AI are equal subsets of AI
Deep Learning is a superset of all other categories

Answer: B

The hierarchy goes from broadest to most specific: AI -> Machine Learning -> Deep Learning -> Generative AI. Generative AI is the most specialized subset.

Q2. A developer notices that sending the exact same prompt to an LLM twice produces two slightly different responses. What is the most accurate explanation for this behavior?

The model has a memory leak that corrupts previous outputs
LLM output is non-deterministic because token selection is based on probabilities, not fixed rules
The model retrains itself between each query
The API randomly shuffles output for security purposes

Answer: B

LLMs generate each token by sampling from a probability distribution of possible next words. This statistical sampling means the same prompt can yield different -- but equally valid -- outputs each time.

Q3. What neural network architecture powers most modern Large Language Models like GPT-4 and Claude?

Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
Transformer
Generative Adversarial Network (GAN)

Answer: C

The Transformer architecture, introduced in 2017, revolutionized NLP with its attention mechanism. All major modern LLMs including GPT-4, Claude, Llama, and Gemini are built on transformer architecture.

Q4. A company wants to build an AI that generates realistic product images from text descriptions. Which type of generative AI model is MOST suitable for this task?

Large Language Model (LLM)
Diffusion Model
Recurrent Neural Network
Random Forest Classifier

Answer: B

Diffusion models are specifically designed for image generation. They learn to generate images by reversing a noise-addition process, guided by text prompts. Models like Stable Diffusion and DALL-E use this approach.

Q5. Which learning approach do Foundation Models primarily use during their initial pre-training phase?

Supervised Learning with manually labeled datasets
Reinforcement Learning with human feedback
Self-Supervised Learning that generates labels from data structure
Unsupervised clustering of data points

Answer: C

Foundation Models use self-supervised learning during pre-training, where the model creates its own labels from the data (e.g., predicting masked words or next tokens). This allows training on massive unlabeled datasets.

Amazon Bedrock - Overview

What is Amazon Bedrock?

Amazon Bedrock is a fully managed AWS service for building generative AI applications. It gives you access to a wide selection of Foundation Models from multiple providers through a single, unified API -- without having to manage any infrastructure.

ASCII DIAGRAM: Amazon Bedrock Architecture Overview

+-----------------------------------------------------------------------------+
|                         AMAZON BEDROCK                                      |
|  +------------------------------------------------------------------------+ |
|  |                      UNIFIED API LAYER                                  | |
|  |         (Single interface for ALL models & features)                   | |
|  +------------------------------------------------------------------------+ |
|            |              |              |              |                   |
|            v              v              v              v                   |
|  +--------------+ +--------------+ +--------------+ +--------------+        |
|  |   AMAZON     | |  ANTHROPIC   | |    META      | |  STABILITY   |        |
|  |   Titan/Nova | |   Claude     | |   Llama      | |    AI        |   ...  |
|  +--------------+ +--------------+ +--------------+ +--------------+        |
|                                                                              |
|  +-------------------------------------------------------------------------+|
|  |                     BEDROCK FEATURES                                    ||
|  |  +-----------+ +-----------+ +-----------+ +-----------+ +-----------+  ||
|  |  | Knowledge | |   Fine-   | |  Agents   | | Guardrails| |   Model   |  ||
|  |  |   Bases   | |  Tuning   | |           | |           | |Evaluation |  ||
|  |  |   (RAG)   | |           | |           | |           | |           |  ||
|  |  +-----------+ +-----------+ +-----------+ +-----------+ +-----------+  ||
|  +-------------------------------------------------------------------------+|
|                                                                              |
|  +-------------------------------------------------------------------------+|
|  |  DATA PRIVACY: Your data NEVER leaves your AWS account                 ||
|  |  Your data is NEVER used to train provider models                      ||
|  +-------------------------------------------------------------------------+|
+------------------------------------------------------------------------------+

Key Characteristics:

Fully Managed - no servers to provision, patch, or scale
Unified API - one consistent interface to access all available models
Pay-Per-Use - charged based on tokens processed or images generated
Data Privacy - your data never leaves your AWS account; it is never used to train the provider's original model
Private Copy - when you use a model, Bedrock creates a private copy for you

Foundation Model Providers on Bedrock:

AI21 Labs, Anthropic, Amazon (Titan & Nova), Cohere, Meta, Mistral AI, Stability AI, and more -- with new providers added over time.

Core Capabilities of Amazon Bedrock:

Capability	Description
Playground	Interactive console to test and compare models
Knowledge Bases (RAG)	Connect external data sources for up-to-date, accurate responses
Fine-Tuning	Customize a model copy with your own data
Agents	Enable models to autonomously plan and execute multi-step tasks
Guardrails	Filter harmful content, enforce topic restrictions, mask PII
Model Evaluation	Automatically or manually score model quality
CloudWatch Integration	Log and monitor all model invocations

Bedrock Playground:

An interactive interface within the Bedrock console that lets you:

Browse models via the Model Catalog (filter by provider, capability)
Test models with text/chat prompts or image generation prompts
Compare two models side-by-side for quality, speed, and cost
See token counts, latency, and output formatting differences per model

Bedrock Supported Use Cases:

Chatbots and virtual assistants
Content generation and summarization
Code generation and debugging
Semantic search and Q&A systems
Image and video generation
Document analysis and data extraction
Translation and localization

Key Terms

Term	Definition
Amazon Bedrock	A fully managed AWS service that provides access to multiple Foundation Models via a single unified API, enabling GenAI application development without infrastructure management.
Unified API	A single, standardized way to interact with all models on Bedrock, regardless of which provider's model you choose. Your application code doesn't change when you swap models.
Model Catalog	The browsable directory within Bedrock where you can discover, filter, and select Foundation Models by provider and capability (text, image, embeddings, etc.).
Fully Managed Service	A cloud service where AWS handles all infrastructure management -- no provisioning, patching, or scaling needed from the customer.
Bedrock Playground	An interactive console interface in Amazon Bedrock for testing and comparing Foundation Models before integrating them into applications.
Model Provider	A company that builds and offers Foundation Models on Amazon Bedrock, such as Anthropic (Claude), Meta (Llama), Stability AI (Stable Diffusion), or Amazon (Titan, Nova).
Serverless	A cloud computing model where the provider manages all infrastructure. Bedrock is serverless -- you focus on using models, not managing servers.
InvokeModel API	The primary Bedrock API call used to send prompts to a Foundation Model and receive generated responses. The same API works across all providers.
Model Access	Before using a model on Bedrock, you must request and be granted access to it. Some models require acceptance of End User License Agreements (EULAs).

Exam Tips:

Bedrock is the PRIMARY AWS service for GenAI -- if an exam question involves building a GenAI app on AWS, the answer likely involves Bedrock.
Your data in Bedrock NEVER leaves your account and is NEVER used to retrain the provider's model. This is a key data privacy guarantee.
Bedrock uses a UNIFIED API -- one way to call all models. You don't need a different SDK per model.
The Bedrock playground is for TESTING -- real applications use the Bedrock API programmatically.
Know the six core Bedrock capabilities: Playground, Knowledge Bases, Fine-Tuning, Agents, Guardrails, Evaluation.
Bedrock is SERVERLESS -- no EC2 instances, no capacity planning, no infrastructure management needed.
You must REQUEST ACCESS to models before using them -- access is not automatic for all models.
Bedrock creates a PRIVATE COPY of models for your use -- you're not sharing a model instance with other customers.
Switching between model providers (e.g., Claude to Titan) requires ONLY changing the model ID -- no code rewrite needed.
Bedrock integrates natively with other AWS services: S3, Lambda, CloudWatch, IAM, VPC, and more.

Practice Questions

Q1. A company wants to build a GenAI application on AWS that can switch between different AI providers (e.g., Anthropic and Amazon) without rewriting application code. Which AWS service best supports this requirement?

Amazon SageMaker
Amazon Rekognition
Amazon Bedrock
AWS Lambda

Answer: C

Amazon Bedrock provides a unified API that works consistently across all supported Foundation Model providers. Swapping models requires only a model ID change, not a code rewrite.

Q2. A data privacy officer is concerned that using Amazon Bedrock will expose their company's proprietary training data to third-party AI providers. What should you tell them?

Their data may be used to improve provider models, so they should encrypt it
Amazon Bedrock keeps all customer data within the customer's AWS account and never shares it with model providers for training
Customers must sign a separate NDA with each model provider on Bedrock
Only AWS-native models (Titan, Nova) guarantee data privacy on Bedrock

Answer: B

A core data privacy guarantee of Amazon Bedrock is that your data -- including prompts and fine-tuning data -- stays within your AWS account and is never sent back to model providers for training their base models.

Q3. A startup with limited DevOps resources wants to build a GenAI chatbot. They want to avoid managing servers, scaling infrastructure, or patching systems. Which characteristic of Amazon Bedrock addresses this need?

On-demand pricing model
Multi-provider model catalog
Fully managed serverless architecture
Provisioned Throughput guarantee

Answer: C

Amazon Bedrock is fully managed and serverless. AWS handles all infrastructure including provisioning, scaling, patching, and maintenance. The startup can focus entirely on building their chatbot without DevOps overhead.

Q4. Before a developer can use Claude models on Amazon Bedrock, what step must they complete?

Deploy an EC2 instance to host the model
Request and receive access to the model in the Bedrock console
Sign a contract directly with Anthropic
Configure a VPC endpoint for Claude specifically

Answer: B

Before using any model on Bedrock, you must request access through the Bedrock console. Some models require accepting an End User License Agreement (EULA). Access is not automatically granted for all models.

Q5. A solutions architect is comparing AWS services for a text generation use case. What is the PRIMARY advantage of Amazon Bedrock over Amazon SageMaker for accessing pre-built Foundation Models?

Bedrock offers lower pricing for all models
Bedrock provides immediate access to multiple providers via a unified API without infrastructure management
Bedrock supports custom model training from scratch
Bedrock offers more GPU instance types

Answer: B

Bedrock's primary advantage is providing instant access to multiple Foundation Models from various providers through a single unified API, with no infrastructure to manage. SageMaker is better suited for custom model training and hosting, not pre-built FM access.

Amazon Bedrock - Foundation Model Selection

Choosing the Right Foundation Model:

There is no single 'best' model -- selection depends on your use case, budget, and requirements. Key factors to evaluate:

Model Type - text-only vs. multimodal (text + image + video + audio)
Context Window - maximum number of tokens the model can process at once; larger = more memory and coherence
Latency - how fast the model responds; smaller models are generally faster
Pricing - cost per 1,000 input/output tokens; varies significantly by model
Licensing - open source vs. commercial
Customizability - whether the model supports fine-tuning
Multimodal capability - can it accept and generate multiple types of input/output (text, image, audio, video simultaneously)?

Model Comparison (Exam-Relevant Examples):

Model	Provider	Best For	Context Window	Notes
Titan Text Express	Amazon	Content creation, classification	8K tokens	Very cost-effective
Llama 2	Meta	Dialogue, tech generation	4K tokens	Open source
Claude	Anthropic	Analysis, large document Q&A	200K tokens	Large context window
Stable Diffusion	Stability AI	Image generation ONLY	N/A	Not for text tasks

Amazon Titan -- Key Model Family to Know for the Exam:

Amazon's own high-performing Foundation Models, available directly on Bedrock.

Supports text, images, and multimodal tasks
Customizable with your own data (fine-tuning)
Accessible via the same Bedrock unified API
Competitively priced

General Guidance:

Smaller models = cheaper + faster + less capable
Larger context window = handle larger documents and code bases
Multimodal = accepts and generates text, images, audio, and video together
Always test multiple models against your real workload before committing

Model Selection Decision Framework:

Define your primary task (text gen, image gen, embeddings, etc.)
Estimate input/output sizes (do you need a large context window?)
Determine latency requirements (real-time vs. batch)
Set budget constraints (tokens/month or images/month)
Check fine-tuning requirements (not all models support it)
Test 2-3 candidate models in Bedrock Playground before deciding

Open Source vs. Commercial Models:

Open Source (e.g., Llama) - free to use, can be self-hosted, community-driven improvements
Commercial (e.g., Claude, GPT-4) - licensing required, better support, often higher quality

Key Terms

Term	Definition
Context Window	The maximum number of tokens a model can consider at once during generation. Larger context windows allow processing of longer documents or conversations.
Multimodal Model	A model that can accept and/or produce multiple types of data simultaneously -- for example, taking image + text as input and returning text output.
Amazon Titan	AWS's own family of Foundation Models available on Bedrock. Supports text and image tasks, is customizable, and is accessible via the Bedrock unified API.
Latency (Model)	The time it takes for a model to generate a complete response after receiving a prompt. Smaller, simpler models generally have lower latency.
Open Source Model	A Foundation Model whose weights and architecture are publicly available for free use, modification, and self-hosting. Examples: Llama, Mistral.
Commercial Model	A Foundation Model that requires licensing or payment to use. Access is controlled by the provider. Examples: Claude, GPT-4.
Model Parameters	The number of trainable weights in a neural network. Larger parameter counts generally mean more capability but also higher cost and latency.
Inference	The process of using a trained model to generate predictions or outputs from new input data. Each Bedrock API call performs inference.
Model Benchmark	Standardized tests used to compare model performance across tasks like reasoning, coding, and knowledge. Examples: MMLU, HumanEval.

Exam Tips:

Amazon Titan is AWS's OWN model family -- expect exam questions asking which model is native to AWS.
Claude's distinguishing feature is its VERY LARGE context window (200K tokens) -- ideal for analyzing large documents or codebases.
Stable Diffusion = images ONLY. If a question asks about text generation, Stable Diffusion is a wrong answer.
Larger context window -> more memory -> higher cost per call. It's a tradeoff.
Multimodal = can handle MULTIPLE input/output types at the same time. This is different from a model that does either text OR images.
Llama is META's model family and is OPEN SOURCE -- key distinction from commercial models.
More parameters = more capable but also SLOWER and MORE EXPENSIVE to run.
Always TEST models in the Bedrock Playground before committing to production -- there's no single best model.
If the exam mentions 'open source Foundation Model on Bedrock' -- think Llama or Mistral.
Context window size should match your use case -- don't pay for 200K tokens if you only need 4K.

Practice Questions

Q1. A legal firm needs to upload 500-page contracts to an AI model and ask questions about their content. Which model characteristic is MOST important to prioritize?

Low latency
Image generation capability
Large context window
Open source licensing

Answer: C

A 500-page document contains a massive number of tokens. A large context window (like Claude's 200K tokens) allows the model to hold the entire document in memory at once and answer questions about it coherently.

Q2. Which Amazon Bedrock Foundation Model is built and maintained directly by AWS?

Claude
Llama 2
Amazon Titan
Stable Diffusion

Answer: C

Amazon Titan is AWS's own Foundation Model family, available on Bedrock. Claude is from Anthropic, Llama 2 is from Meta, and Stable Diffusion is from Stability AI -- all third-party providers accessed through Bedrock.

Q3. A company wants to use a Foundation Model without paying licensing fees and potentially host it on their own infrastructure later. Which model type should they choose?

Claude (Anthropic)
Llama (Meta)
Stable Diffusion XL
Amazon Nova Premier

Answer: B

Llama from Meta is an open source model family available on Bedrock. Open source models are free to use and can be self-hosted without licensing fees, making them ideal for this requirement.

Q4. A real-time customer service chatbot needs to respond within 1 second for good user experience. The conversations are short (under 500 tokens each). Which model selection strategy is MOST appropriate?

Choose the model with the largest context window
Choose a smaller, faster model optimized for low latency
Choose an image generation model for visual responses
Choose the most expensive enterprise model

Answer: B

For real-time chatbots with short conversations, low latency is critical. Smaller models respond faster and are sufficient for short conversations. A large context window is unnecessary and would add latency and cost.

Q5. A data scientist is evaluating two models for a text classification task. Model A has 7 billion parameters and Model B has 70 billion parameters. What trade-off should they expect?

Model A will be more accurate but slower
Model B will likely be more capable but also more expensive and slower
Parameter count has no impact on performance
Model A will have a larger context window

Answer: B

Larger parameter counts generally mean more capability and accuracy but come with higher computational costs (more expensive per token) and higher latency. The data scientist should test both to see if Model A's performance is sufficient for the use case.

Amazon Bedrock - Fine-Tuning a Model

What is Fine-Tuning?

Fine-tuning adapts a COPY of a Foundation Model to your specific use case by training it further on your own data. It modifies the model's internal weights, making it better suited to your domain -- without building a model from scratch.

ASCII DIAGRAM: Foundation Model -> Fine-Tuning -> Inference Pipeline

+-----------------------------------------------------------------------------+
|           FINE-TUNING PIPELINE ON AMAZON BEDROCK                            |
+-----------------------------------------------------------------------------+

   +--------------+         +------------------+         +------------------+
   |  BASE FM     |         |   YOUR TRAINING  |         |   FINE-TUNED     |
   |  (e.g.,Titan)|    +    |   DATA IN S3     |    =    |   MODEL COPY     |
   |  Original    |         |   (JSON/JSONL)   |         |   (Custom Weights|
   |  Weights     |         |                  |         |    in your acct) |
   +------+-------+         +--------+---------+         +--------+---------+
          |                          |                            |
          |                          |                            |
          v                          v                            v
   +--------------------------------------------------------------------------+
   |                      BEDROCK FINE-TUNING JOB                             |
   |   * Creates private copy of the base model                               |
   |   * Adjusts weights using your labeled/unlabeled data                    |
   |   * Training runs in your AWS account (data never leaves)                |
   +--------------------------------------------------------------------------+
                                      |
                                      v
   +--------------------------------------------------------------------------+
   |                      INFERENCE (USING THE MODEL)                         |
   |   * REQUIRES Provisioned Throughput (NOT on-demand)                      |
   |   * Monthly commitment for guaranteed capacity                           |
   |   * Call via same Bedrock API with your custom model ID                  |
   +--------------------------------------------------------------------------+
                                      |
                                      v
   +---------------+         +---------------+         +-------------------+
   | User Prompt   |   --->  | Custom Model  |   --->  | Domain-Optimized  |
   |               |         | (Your Weights)|         |     Response      |
   +---------------+         +---------------+         +-------------------+

Your private copy is stored in your AWS account
Training data must be stored in Amazon S3
Not all models on Bedrock support fine-tuning (check documentation)
Fine-tuned models CANNOT run on-demand -- they require Provisioned Throughput

Three Fine-Tuning Techniques on Bedrock:

1. Supervised Fine-Tuning

Trains the model using LABELED input/output pairs
You provide: prompt -> expected completion (e.g., question -> ideal answer)
Best for: adapting a model to a specific domain or task where you know the correct answer
Exam keyword: 'labeled data' or 'input/output pairs' -> Supervised Fine-Tuning
Example data format: { "prompt": "What is your return policy?", "completion": "Returns accepted within 30 days." }

2. Reinforcement Fine-Tuning

Trains the model using only INPUTS + a REWARD FUNCTION (no labeled outputs)
The model generates multiple responses -> each is scored by the reward function -> scores feed back to improve the model
Reward function can be:
Objective tasks (code correctness, math) -> use AWS Lambda to write scoring logic
Subjective tasks (tone, empathy, quality) -> use a judge model with evaluation instructions
Best for: complex multi-step reasoning, conversational tone refinement, customer service quality
Iterative process: model improves over many rounds of feedback

3. Distillation

A LARGER teacher model trains a SMALLER student model
The student learns from the teacher's inputs AND outputs
Result: a smaller, faster, cheaper model that behaves similarly to the larger one
Cost reduction: up to 75% cheaper than the original model
Trade-off: slight reduction in accuracy vs. the teacher model
Best for: production use cases where speed and cost matter and some accuracy loss is acceptable

Supervised vs. Reinforcement Fine-Tuning -- Quick Comparison:

Aspect	Supervised	Reinforcement
Input data	Labeled (input + output pairs)	Unlabeled (inputs only)
Output evaluation	Compared to labeled answer	Scored by reward function
Process	Single pass	Iterative, feedback loop
Best for	Domain adaptation, task improvement	Tone, reasoning, behavior shaping

Inference Pricing for Fine-Tuned Models:

On-Demand - pay per token; for base models only
Provisioned Throughput - required for fine-tuned/custom models; pay per month for reserved capacity; guarantees max tokens per minute

Continued Pre-Training:

An additional technique where you continue training a base model on UNLABELED domain-specific text. This teaches the model new domain vocabulary and concepts before supervised fine-tuning. Useful for specialized industries (medical, legal, finance).

Fine-Tuning Data Requirements:

Minimum: typically 100-1000 examples for supervised fine-tuning
Format: JSONL (JSON Lines) stored in Amazon S3
Quality matters more than quantity -- well-curated examples produce better results

Key Terms

Term	Definition
Fine-Tuning	The process of further training a copy of a Foundation Model on your own domain-specific data to improve its performance on targeted tasks.
Supervised Fine-Tuning	Fine-tuning using labeled input/output pairs where the model learns from examples with known correct answers.
Reinforcement Fine-Tuning	Fine-tuning where the model generates multiple responses to inputs and iteratively improves based on scores from a reward function -- no labeled outputs needed.
Reward Function	A scoring mechanism used in reinforcement fine-tuning that evaluates the quality of a model's response. Can be code-based (Lambda) for objective tasks or a judge model for subjective tasks.
Distillation	A fine-tuning technique where a large teacher model trains a smaller student model, producing a cheaper, faster model with similar behavior.
Provisioned Throughput	A pricing model required for fine-tuned or custom models on Bedrock. You reserve capacity and pay monthly for a guaranteed maximum token throughput.
Judge Model	A second AI model used in reinforcement fine-tuning to evaluate and score responses from the model being trained, particularly for subjective or qualitative tasks.
Teacher Model	In distillation, the larger, more capable model that generates outputs used to train the smaller student model.
Student Model	In distillation, the smaller, faster model being trained to mimic the behavior of the larger teacher model.
Continued Pre-Training	A technique where a base model is further trained on unlabeled domain-specific text to learn new vocabulary and concepts before supervised fine-tuning.
JSONL Format	JSON Lines format -- a file format where each line is a valid JSON object. Required for Bedrock fine-tuning training data stored in S3.
Model Weights	The learnable parameters of a neural network that are adjusted during training. Fine-tuning modifies these weights to adapt the model to new tasks.

Exam Tips:

'Labeled data' or 'input/output pairs' in an exam question -> answer is Supervised Fine-Tuning.
'Reward function' or 'feedback-based learning' -> answer is Reinforcement Fine-Tuning.
'Smaller, faster, cheaper model from a larger one' -> answer is Distillation.
Fine-tuned models CANNOT use on-demand pricing -- they REQUIRE Provisioned Throughput.
Fine-tuning modifies the MODEL WEIGHTS. RAG does NOT modify model weights -- it retrieves external data instead.
Distillation can reduce model cost by up to 75% -- a key benefit for production cost optimization.
Training data for fine-tuning must be stored in AMAZON S3 in JSONL format.
Not all models support fine-tuning -- check Bedrock documentation for supported models.
A JUDGE MODEL is used in reinforcement fine-tuning for SUBJECTIVE tasks like tone and quality.
Continued pre-training + supervised fine-tuning is a two-step process for highly specialized domains.
Fine-tuning creates a PRIVATE COPY of the model -- the base model is never modified.

Practice Questions

Q1. A company wants to train an Amazon Bedrock model to respond in a specific brand voice and tone. They have a dataset of 10,000 example customer conversations with ideal responses already written. Which fine-tuning technique should they use?

Reinforcement Fine-Tuning
Distillation
Supervised Fine-Tuning
RAG (Retrieval-Augmented Generation)

Answer: C

Supervised Fine-Tuning uses labeled input/output pairs -- in this case, the conversations are the inputs and the ideal responses are the labeled outputs. This is the textbook use case for supervised fine-tuning.

Q2. A startup needs a production AI model that is fast and cost-effective. They currently use a large, high-accuracy Bedrock model but costs are too high. Which technique creates a smaller, cheaper model that inherits behavior from the large one?

Supervised Fine-Tuning
Reinforcement Fine-Tuning
Distillation
Prompt Engineering

Answer: C

Distillation transfers knowledge from a large teacher model to a smaller student model, producing up to 75% cost reduction while maintaining similar behavior. It is specifically designed for this production efficiency use case.

Q3. After fine-tuning a model on Amazon Bedrock, a developer tries to use the on-demand pricing model to invoke it but receives an error. What is the most likely cause?

Fine-tuned models are not supported in Amazon Bedrock
Fine-tuned models require Provisioned Throughput, not on-demand pricing
The model must be re-deployed to Amazon SageMaker after fine-tuning
On-demand pricing is only available in us-east-1

Answer: B

On-demand pricing on Bedrock works only for base (unmodified) models. Fine-tuned, custom, and imported models require Provisioned Throughput -- you reserve capacity and pay monthly.

Q4. A company wants to fine-tune a model to evaluate whether code solutions are correct or incorrect. They only have the problem statements, not pre-written solutions. Which fine-tuning approach is MOST suitable?

Supervised Fine-Tuning with labeled pairs
Reinforcement Fine-Tuning with a Lambda-based reward function
Distillation from a coding teacher model
RAG with a code documentation knowledge base

Answer: B

Reinforcement Fine-Tuning is ideal when you have inputs but not labeled outputs. A Lambda function can programmatically evaluate code correctness (compile, run tests) and provide reward scores. This is an objective task that doesn't require a judge model.

Q5. A healthcare company wants to adapt a Foundation Model to understand medical terminology and concepts before fine-tuning it on specific clinical tasks. Which technique should they apply FIRST?

Supervised Fine-Tuning
Reinforcement Fine-Tuning
Continued Pre-Training on medical literature
Distillation from a medical expert model

Answer: C

Continued Pre-Training exposes the model to unlabeled domain-specific text (medical literature, journals, terminology) to learn new vocabulary and concepts. This should be done BEFORE supervised fine-tuning on specific clinical tasks.

Q6. In a distillation workflow on Amazon Bedrock, what role does Nova Premier typically play?

The student model being trained
The teacher model providing training outputs
The reward function evaluating quality
The vector database storing embeddings

Answer: B

Nova Premier is Amazon's most capable Nova model and is specifically recommended as the teacher model for distillation workflows. It generates high-quality outputs that smaller student models learn to mimic.

Amazon Bedrock - FM Evaluation

Why Evaluate a Foundation Model?

Before deploying a GenAI model in production, you need to objectively measure its quality, accuracy, safety, and suitability for your use case.

Two Evaluation Approaches on Bedrock:

1. Automatic Evaluation

Uses algorithms or another AI model to score outputs -- no human involvement needed.

*Programmatic:*

Choose a task type (text summarization, Q&A, text classification, open-ended generation)
Provide prompt datasets (your own or AWS built-in curated datasets)
Metrics are computed automatically
Results stored in Amazon S3
Metrics: Toxicity, Accuracy, Robustness

*Model as a Judge:*

A separate evaluator model (e.g., Claude 3.5 Sonnet) scores the outputs of the model being tested
Useful when quality is hard to measure algorithmically
Can evaluate models that live OUTSIDE of Bedrock (bring-your-own inference responses)
Metrics: Helpfulness, Faithfulness, and more

2. Human Evaluation

Real people assess generated outputs for quality
Two workforce options:
AWS Managed Work Team - AWS-sourced human reviewers
Bring Your Own Workforce - your own employees or subject matter experts (SMEs)
Can compare up to TWO models simultaneously
Scoring methods: thumbs up/down, numerical ranking, custom criteria

Key Evaluation Metrics (Know These for the Exam):

Metric	Full Name	Measures	Best For
ROUGE	Recall-Oriented Understudy for Gisting Evaluation	Word/n-gram overlap between reference and generated text	Summarization, translation
BLEU	Bilingual Evaluation Understudy	Precision of n-gram matches; penalizes too-short outputs	Translation quality
BERTScore	BERT-based Semantic Score	Semantic (meaning) similarity using embeddings	Context-aware quality evaluation
Perplexity	--	How confidently a model predicts the next token; lower = better	General language model quality

ROUGE vs. BLEU vs. BERTScore -- The Key Difference:

ROUGE and BLEU compare WORDS and word combinations (n-grams) -- they don't understand meaning
BERTScore compares MEANING using embeddings -- 'happy' and 'joyful' score as similar even though they're different words

N-Grams Explained:

1-gram (unigram) = individual words
2-gram (bigram) = consecutive word pairs (e.g., 'apple fell', 'fell from')
Higher n-gram = stricter matching; requires longer exact sequences to match

Benchmark Datasets:

Curated evaluation datasets used to measure model performance consistently:

Test accuracy, speed, scalability, and BIAS across diverse topics and populations
Extremely useful for detecting potential discrimination or unfair outputs
Can be AWS built-in or custom datasets tailored to your business

Business Metrics for Model Evaluation:

Beyond technical scores, evaluate real-world impact:

User satisfaction scores
Conversion rates
Average revenue per user
Cross-domain performance
Operational efficiency and cost per query

Additional Evaluation Metrics:

Metric	Measures
F1 Score	Balance of precision and recall for classification tasks
Recall	What percentage of relevant items were retrieved
Precision	What percentage of retrieved items were relevant
Accuracy	Overall correctness of predictions
Toxicity	How often model generates harmful or offensive content
Faithfulness	Whether model response is factually consistent with source

Key Terms

Term	Definition
ROUGE	Recall-Oriented Understudy for Gisting Evaluation. Measures word/n-gram overlap between a reference text and generated text. Best for evaluating summarization and translation.
BLEU	Bilingual Evaluation Understudy. Measures precision of n-gram matches between generated and reference text, with a penalty for outputs that are too short. Primarily used for translation evaluation.
BERTScore	A semantic similarity metric that uses BERT embeddings to compare the MEANING of generated vs. reference text, rather than exact word matches.
Perplexity	A measure of how confidently a model predicts the next token. Lower perplexity = more confident and accurate model.
N-gram	A sequence of N consecutive words or tokens. ROUGE and BLEU use n-gram overlap to measure text similarity.
Benchmark Dataset	A curated collection of prompts and ideal answers used to objectively test a model's performance, accuracy, and potential bias across diverse topics.
Model as a Judge	An evaluation approach where a second, separate AI model (like Claude) scores the outputs of the model being evaluated, useful for nuanced or subjective quality assessment.
Faithfulness	An evaluation metric measuring whether a model's response is factually consistent with the source material provided. Critical for RAG applications.
Toxicity Score	A metric measuring how often a model generates harmful, offensive, or inappropriate content. Used to ensure responsible AI behavior.
Human Evaluation	Using real people (AWS workforce or your own team) to assess model outputs for quality, relevance, and appropriateness.
F1 Score	The harmonic mean of precision and recall, providing a single score that balances both metrics. Used for classification evaluation.
Ground Truth	The known correct answer or reference output that model predictions are compared against during evaluation.

Exam Tips:

ROUGE = summarization and translation. BLEU = translation. BERTScore = semantic/meaning similarity. Know which metric fits which use case.
BERTScore is the ONLY metric that understands MEANING -- ROUGE and BLEU only compare word patterns.
Perplexity: LOWER is BETTER. Lower perplexity = model is more confident and accurate.
Benchmark datasets can detect model BIAS -- a frequently tested exam concept.
Human evaluation can compare UP TO TWO models at the same time on Bedrock.
'Model as a Judge' means one AI model evaluates ANOTHER model's output -- no human needed.
FAITHFULNESS measures whether output matches SOURCE FACTS -- critical for RAG applications.
TOXICITY measures harmful content generation -- key for responsible AI compliance.
AWS provides BUILT-IN benchmark datasets, but you can also bring your own custom datasets.
For TRANSLATION quality specifically, BLEU is the standard metric -- know this for the exam.

Practice Questions

Q1. A company needs to evaluate whether their fine-tuned translation model produces accurate translations. Which evaluation metric is MOST appropriate for this use case?

Perplexity
BERTScore
BLEU
Toxicity Score

Answer: C

BLEU (Bilingual Evaluation Understudy) is specifically designed to evaluate translation quality. It measures how closely a generated translation matches a reference translation using n-gram precision, with a brevity penalty.

Q2. An AI team wants to ensure their customer service chatbot doesn't discriminate against any demographic group. Which Bedrock evaluation tool is MOST helpful for detecting this type of issue?

CloudWatch Metrics
Benchmark Datasets
Provisioned Throughput
Guardrails -- Denied Topics

Answer: B

Benchmark datasets are specifically designed to test models across diverse topics, demographics, and linguistic scenarios. They are the standard tool for detecting bias and potential discrimination in model outputs.

Q3. Which evaluation metric measures the SEMANTIC SIMILARITY of generated text, understanding that words like 'happy' and 'joyful' carry similar meaning?

ROUGE-N
BLEU
Perplexity
BERTScore

Answer: D

BERTScore uses embedding-based comparisons to measure semantic similarity -- it understands the MEANING of text, not just word matches. ROUGE and BLEU only compare exact word patterns (n-grams).

Q4. A model achieves a perplexity score of 15, while another model achieves a perplexity of 45 on the same test dataset. Which model is performing BETTER?

The model with perplexity 45 -- higher is better
The model with perplexity 15 -- lower is better
Both are equivalent -- perplexity doesn't indicate quality
Cannot determine without knowing the BLEU scores

Answer: B

Perplexity measures how confidently a model predicts the next token. LOWER perplexity means the model is more confident and accurate. A perplexity of 15 indicates better performance than 45.

Q5. A company wants to evaluate whether their RAG-based model's responses are factually consistent with the retrieved documents. Which evaluation metric should they prioritize?

ROUGE
Perplexity
Faithfulness
BLEU

Answer: C

Faithfulness measures whether the model's response is factually consistent with the source material (retrieved documents in RAG). This directly addresses the concern about factual accuracy in RAG applications.

Q6. A team needs subject matter experts from their own company to evaluate specialized medical AI outputs. Which Bedrock human evaluation option should they use?

AWS Managed Work Team
Bring Your Own Workforce
Model as a Judge with Claude
Automatic programmatic evaluation

Answer: B

Bring Your Own Workforce allows your own employees or subject matter experts to evaluate model outputs. For specialized medical content, internal medical experts would be more appropriate than AWS's general managed workforce.

RAG and Knowledge Bases

What is RAG (Retrieval-Augmented Generation)?

RAG is a technique that allows a Foundation Model to reference an external data source without being retrained or fine-tuned. It 'augments' the model's prompt with retrieved context from your private knowledge base.

The Core Problem RAG Solves:

Foundation Models are trained on data up to a cutoff date and know nothing about your private business data. RAG solves both problems -- it retrieves current, private, relevant data and injects it into the model's prompt at query time.

ASCII DIAGRAM: RAG (Retrieval-Augmented Generation) Flow

+-------------------------------------------------------------------------------------+
|                    RAG WORKFLOW ON AMAZON BEDROCK                                    |
+-------------------------------------------------------------------------------------+

  PHASE 1: INGESTION (ONE-TIME SETUP)
  ===================================
  +-------------+    +-------------+    +-------------+    +-----------------+
  |   SOURCE    |    |  CHUNKING   |    |  EMBEDDING  |    |  VECTOR         |
  |  DOCUMENTS  |--->|   Split     |--->|   MODEL     |--->|  DATABASE       |
  |  (S3)       |    |   into      |    |   (Titan    |    |  (OpenSearch)   |
  |             |    |   pieces    |    |   Embed)    |    |                 |
  +-------------+    +-------------+    +-------------+    +-----------------+
       PDF, DOCX,        ~500 words       Convert to          Store vectors
       HTML, TXT          per chunk      vector arrays        for search

  PHASE 2: QUERY (EVERY USER REQUEST)
  ===================================
  +-------------+    +-------------+    +-------------+    +-----------------+
  |   USER      |    |  EMBEDDING  |    |  VECTOR     |    |   RETRIEVE      |
  |  QUESTION   |--->|   MODEL     |--->|   SEARCH    |--->|   TOP K         |
  |             |    |             |    |   (KNN)     |    |   CHUNKS        |
  +-------------+    +-------------+    +-------------+    +-----------------+
       "What is            Vectorize       Find similar        Most relevant
        the policy?"       question        embeddings          text chunks

                                              |
                                              v
  +-----------------------------------------------------------------------------------+
  |                        AUGMENTED PROMPT                                            |
  |   +-----------------------------------------------------------------------------+ |
  |   | Context: [Retrieved Chunk 1] [Retrieved Chunk 2] [Retrieved Chunk 3]       | |
  |   | Question: What is the policy?                                               | |
  |   | Answer based ONLY on the context above.                                     | |
  |   +-----------------------------------------------------------------------------+ |
  +-----------------------------------------------------------------------------------+
                                              |
                                              v
  +-------------+    +-----------------------------------------------------------------+
  | FOUNDATION  |--->|  GROUNDED RESPONSE                                              |
  |   MODEL     |    |  "Based on our policy documents, returns are accepted within   |
  |  (Claude)   |    |   30 days for a full refund..."                                |
  +-------------+    +-----------------------------------------------------------------+
      Model uses          Answer is grounded in your data,
      context to          not the model's general knowledge
       answer

How RAG Works -- Step by Step:

Data Ingestion - your documents are stored in Amazon S3 and chunked into smaller pieces
Embedding - each chunk is converted into a vector (numerical representation) by an embeddings model (e.g., Amazon Titan Embeddings)
Storage - vectors are stored in a vector database
Query - a user sends a question to the model
Search - the question is vectorized and used to search the knowledge base for semantically similar chunks
Augmentation - relevant chunks are retrieved and combined with the original question into an 'augmented prompt'
Generation - the Foundation Model receives the augmented prompt and generates a grounded, accurate response

RAG vs. Fine-Tuning -- Critical Distinction:

Aspect	RAG	Fine-Tuning
Changes model weights?	NO	YES
Data currency	Real-time / up-to-date	Frozen at training time
Cost	Lower (no model retraining)	Higher (computation-intensive)
Best for	Private/real-time data lookups	Domain adaptation, tone, style

Embeddings and Vector Databases:

An embedding is an array of numbers (a vector) that mathematically encodes the meaning of a piece of text
Words/phrases with similar meaning have similar vectors (numerically close)
Vector databases enable fast similarity search -- find the most relevant chunks for any query
This is why RAG can find relevant context even if the exact words don't match

Vector Database Options on Bedrock:

Database	Type	Notes
Amazon OpenSearch Service	AWS-native	Best for production RAG; KNN search; highly scalable
Amazon Aurora (PostgreSQL)	AWS-native	Relational + vector search
Amazon Neptune Analytics	AWS-native	Graph-based RAG (GraphRAG)
Amazon S3 Vectors	AWS-native	Cost-effective, durable, sub-second queries
MongoDB / Redis / Pinecone	External	Third-party options; Pinecone has a free tier

Data Sources Supported by Bedrock Knowledge Bases:

Amazon S3 (primary), Confluence, Microsoft SharePoint, Salesforce, Web crawlers (websites/social media feeds)

Key RAG Use Cases:

Customer service chatbot backed by product/FAQ knowledge base
Legal research assistant referencing laws, cases, and regulations
Healthcare Q&A using clinical guidelines and research papers
Internal company chatbot accessing HR policies and documentation

Chunking Strategies:

Fixed-size chunks - split by character or word count (simple but may break mid-sentence)
Semantic chunks - split by paragraph or section boundaries (preserves context)
Overlapping chunks - include some overlap between chunks to avoid missing context at boundaries
Chunk size tradeoff: smaller = more precise retrieval but less context; larger = more context but less precise

GraphRAG (Knowledge Graph RAG):

An advanced RAG technique using Neptune Analytics. Instead of just retrieving text chunks, it traverses a knowledge graph to find related entities and relationships, providing richer context for complex queries.

Key Terms

Term	Definition
RAG (Retrieval-Augmented Generation)	A GenAI technique that retrieves relevant data from an external knowledge base and injects it into the model's prompt before generation -- enabling accurate, up-to-date, and private-data-aware responses without retraining.
Knowledge Base (Bedrock)	An Amazon Bedrock feature that manages the end-to-end RAG pipeline -- ingesting documents, creating embeddings, storing vectors, and retrieving relevant context for model prompts.
Embedding	A numerical vector representation of text (or other data) that encodes its semantic meaning. Texts with similar meanings have mathematically similar embedding vectors.
Vector Database	A specialized database optimized for storing and searching embedding vectors. Enables fast semantic similarity search -- finding the most relevant content even without exact keyword matches.
Augmented Prompt	The prompt sent to the Foundation Model after RAG retrieval -- it combines the user's original question with the retrieved relevant context from the knowledge base.
Chunking	The process of splitting large documents into smaller text segments before embedding. Allows more precise retrieval of relevant sections rather than entire documents.
Amazon OpenSearch Service	AWS's primary production-grade vector database for RAG on Bedrock. Supports KNN (K-Nearest Neighbor) search for fast embedding similarity queries.
Semantic Search	Search that finds results based on meaning rather than exact keyword matches. Enabled by embeddings and vector similarity search.
GraphRAG	An advanced RAG technique using knowledge graphs (Neptune Analytics) to traverse entity relationships and provide richer context for complex queries.
Top-K Retrieval	In RAG, retrieving the K most similar (closest) document chunks to the user's query. K is configurable based on context window limits.
Amazon Titan Embeddings	AWS's embeddings model available on Bedrock. Converts text to vector representations for use in RAG knowledge bases and semantic search.
Data Ingestion	The process of loading source documents into a RAG knowledge base, where they are chunked, embedded, and stored in a vector database.

Exam Tips:

RAG does NOT change model weights -- it only injects external data into the prompt. Fine-tuning DOES change weights.
RAG is ideal for REAL-TIME or PRIVATE data (company docs, policies, product catalogs) that the base model was never trained on.
The flow is: Documents -> S3 -> Chunking -> Embedding Model -> Vector DB -> Similarity Search -> Augmented Prompt -> FM -> Response.
Amazon OpenSearch Service is the recommended AWS-native vector database for production RAG workloads on Bedrock.
If an exam question mentions 'grounding model responses in company data without retraining' -> the answer is RAG / Knowledge Bases.
Pinecone has a FREE tier -- useful for learning/dev environments when avoiding OpenSearch costs.
EMBEDDINGS enable semantic search -- finding relevant content by MEANING, not just keywords.
The KEY ADVANTAGE of RAG over fine-tuning: data stays CURRENT (update S3, immediate effect) vs. FROZEN (requires retraining).
GraphRAG uses KNOWLEDGE GRAPHS for complex queries involving entity relationships.
Chunk size is a TRADEOFF: smaller = precise, larger = more context. Optimize for your use case.

Practice Questions

Q1. A company wants their AI chatbot to answer questions about their internal HR policies, which are updated frequently. They need the model to always reflect the latest policies without retraining it every time. Which approach is MOST appropriate?

Supervised Fine-Tuning with HR documents
RAG with a Knowledge Base connected to an S3 bucket of HR documents
Distillation of an HR-specific model
Prompt Engineering with HR keywords

Answer: B

RAG retrieves data from an external source at query time -- when the HR policy document in S3 is updated, the chatbot immediately has access to the new information without any model retraining. This is the ideal use case for RAG.

Q2. In Amazon Bedrock's RAG architecture, what is the purpose of an embeddings model?

To fine-tune the Foundation Model with labeled data
To convert text chunks into numerical vector representations for semantic similarity search
To score the quality of model responses using BLEU metrics
To compress large documents so they fit within the model's context window

Answer: B

An embeddings model converts text into high-dimensional numerical vectors. These vectors encode semantic meaning so that similar concepts have mathematically similar representations, enabling similarity search in the vector database.

Q3. Which Amazon Bedrock vector database option is recommended for production-grade RAG workloads requiring fast KNN (K-Nearest Neighbor) search and scalability?

Amazon S3 Vectors
Amazon Neptune Analytics
Amazon OpenSearch Service
Pinecone

Answer: C

Amazon OpenSearch Service is AWS's recommended production-ready vector database for RAG. It supports scalable KNN search across millions of vector embeddings with real-time query performance.

Q4. A legal firm updates their contract templates daily and needs their AI assistant to always reference the latest versions. Why is RAG better than fine-tuning for this scenario?

RAG is cheaper than fine-tuning
RAG retrieves current data at query time, while fine-tuning freezes knowledge at training time
RAG produces more accurate legal language
Fine-tuning cannot work with legal documents

Answer: B

The key advantage of RAG is data currency. RAG retrieves the latest documents from S3 at query time, so updates are immediate. Fine-tuning would require retraining the model every time templates change, which is impractical for daily updates.

Q5. A RAG system is returning irrelevant results even though the correct information exists in the knowledge base. The team discovers that search queries and document chunks are not matching well. What should they investigate?

The Foundation Model's temperature setting
The chunking strategy and embedding model quality
The Provisioned Throughput capacity
The guardrails configuration

Answer: B

Poor RAG retrieval is typically caused by suboptimal chunking (too large, too small, or breaking context) or embedding model quality. Better chunking strategies and potentially a different embeddings model can improve semantic matching.

Q6. An enterprise wants to answer complex queries that require understanding relationships between entities (e.g., 'Which products are manufactured by suppliers in Europe?'). Which RAG approach is MOST suitable?

Standard RAG with OpenSearch
GraphRAG with Amazon Neptune Analytics
RAG with larger chunk sizes
Fine-tuning with relationship data

Answer: B

GraphRAG uses knowledge graphs to traverse entity relationships. For queries requiring understanding of connections between entities (products -> suppliers -> locations), Neptune Analytics provides richer context than standard vector-based RAG.

More GenAI Concepts - Tokenization, Context Windows, and Embeddings

Tokenization:

The process of converting raw text into tokens -- the basic units an LLM processes.

ASCII DIAGRAM: Token Flow (Prompt -> Model -> Response)

+-------------------------------------------------------------------------------------+
|                    TOKEN FLOW IN LLM PROCESSING                                     |
+-------------------------------------------------------------------------------------+

  USER INPUT (Raw Text)                         TOKENIZATION
  ====================                          ============
  "What is machine learning?"        --->      ["What", "is", "machine", "learn", "ing", "?"]
                                                     |
                                                     |  Each token -> numerical ID
                                                     v
                                               [1024, 318, 4673, 2193, 278, 30]

  +-------------------------------------------------------------------------------------+
  |                              CONTEXT WINDOW                                         |
  |  +-------------------------------------------------------------------------------+  |
  |  |  Input Tokens (your prompt)  |  Generated Output Tokens (model response)     |  |
  |  |  ============================|===============================================|  |
  |  |  [1024, 318, 4673, 2193...]  |  [47, 789, 2523, 1456, 8834...]              |  |
  |  |                              |                                               |  |
  |  |       INPUT COST             |         OUTPUT COST                          |  |
  |  |    (charged per 1K tokens)   |    (charged per 1K tokens)                   |  |
  |  +-------------------------------------------------------------------------------+  |
  |                                                                                      |
  |  Total must fit in CONTEXT WINDOW (e.g., 128K, 200K, 1M tokens)                     |
  +-------------------------------------------------------------------------------------+

  TOKEN-BY-TOKEN GENERATION (Non-Deterministic)
  =============================================
  Step 1: Given [Input], predict next token -> "Machine" (p=0.35)
  Step 2: Given [Input + "Machine"], predict next -> "learning" (p=0.82)
  Step 3: Given [Input + "Machine learning"], predict next -> "is" (p=0.71)
  ...
  Result: "Machine learning is a subset of AI that enables systems to..."

  DE-TOKENIZATION
  ===============
  [47, 789, 2523, 1456, 8834...]  --->  "Machine learning is a subset..."
                                          Final human-readable response

Why Tokenize?

Models don't understand raw text directly. Tokenization converts words into numerical IDs that the model can process mathematically.

Tokenization Methods:

Word-based - each full word becomes one token (simple but inefficient for rare words)
Subword-based - common words stay whole; uncommon words split into sub-parts (more efficient)
Example: 'unacceptable' -> 'un' + 'acceptable' (two tokens instead of one rare token)
Example: 'Stephane' -> 'Steph' + 'ane' (the model recognizes 'Steph' as a common name prefix)

Why Tokenization Matters:

Pricing on Bedrock is based on input and output TOKEN counts
Context window limits are measured in TOKENS, not words or characters
Efficient prompts = fewer tokens = lower cost

Context Window:

The maximum number of tokens a model can process at one time -- both input and output combined.

Think of it as the model's 'working memory' -- it can only 'see' what fits in the window
Content outside the context window is not considered during generation
Larger context window = handle more data, longer conversations, bigger documents

Context Window Examples:

Model	Context Window	Real-World Equivalent
GPT-4 Turbo	128,000 tokens	~90,000 words
Claude 2.1	200,000 tokens	~150,000 words / entire novel
Google Gemini 1.5 Pro	1,000,000 tokens	~700,000 words / 1-hour video

Trade-off: Larger context windows require more memory and compute -> higher cost per call.

Embeddings (Deep Dive):

Embeddings are numerical vector representations of text (or images, audio) that encode semantic meaning.

How Embeddings Work:

Text is tokenized
Each token passes through an embeddings model
Output is a high-dimensional vector (e.g., 100 or 1,536 numbers)
The numbers encode the meaning, context, sentiment, and relationships of the token
Vectors are stored in a vector database for similarity search

Why Embeddings Are Powerful:

Words with similar meaning have numerically similar vectors
'Dog' and 'puppy' are close in vector space; 'dog' and 'house' are far apart
Enables semantic search -- find relevant content even without exact keyword matches
Powers RAG, recommendation systems, and search applications

Visualizing High-Dimensional Vectors:

Humans can visualize 2D and 3D space but not 100+ dimensions. Dimensionality reduction techniques compress vectors to 2D/3D for visualization, showing clusters of semantically related words.

Embeddings in Practice (Exam Scenario):

A search application uses an embeddings model to convert user queries and document chunks into vectors. The system then finds documents with the closest vector to the query -- returning semantically relevant results even if the exact words don't match.

Inference Parameters:

Temperature - controls randomness (0 = deterministic, 1 = creative)
Top-K - limits selection to K most probable tokens
Top-P (nucleus sampling) - limits selection to tokens within cumulative probability P
Max Tokens - limits the maximum output length
These affect OUTPUT QUALITY but NOT PRICING (pricing is based on actual tokens used)

Key Terms

Term	Definition
Tokenization	The process of breaking raw text into tokens (words or sub-words) and converting them to numerical IDs that an LLM can mathematically process.
Subword Tokenization	A tokenization strategy where common words are kept whole and rare/long words are split into meaningful sub-parts, improving efficiency and vocabulary coverage.
Context Window	The maximum number of tokens (input + output combined) a model can process in a single request. Defines the model's 'working memory' for a conversation or task.
Embedding Vector	An array of numbers produced by an embeddings model that encodes the semantic meaning of a piece of text. Semantically similar text produces numerically similar vectors.
Semantic Similarity	The degree to which two pieces of text have the same meaning, regardless of whether they use the same exact words. Embeddings enable semantic similarity measurement.
Dimensionality Reduction	A technique to compress high-dimensional vectors (e.g., 1,000 dimensions) into 2D or 3D for visualization, revealing semantic clusters and relationships.
KNN (K-Nearest Neighbor) Search	A vector search algorithm that finds the K most similar vectors to a query vector in a database. Used in vector databases to retrieve the most semantically relevant content.
Temperature	An inference parameter that controls the randomness of model output. Low temperature (0) = more deterministic and focused. High temperature (1) = more creative and varied.
Top-K Sampling	An inference parameter that limits the model's next-token selection to the K most probable tokens, reducing randomness while maintaining some variety.
Top-P (Nucleus) Sampling	An inference parameter where the model selects from the smallest set of tokens whose cumulative probability exceeds P. Dynamically adjusts choices based on probability distribution.
Max Tokens	An inference parameter that sets the maximum number of tokens the model will generate in its response. Does not affect input token limits.
De-tokenization	The process of converting numerical token IDs back into human-readable text after the model generates its response.

Exam Tips:

Pricing on Bedrock is PER TOKEN -- shorter prompts and responses = lower cost.
Context window = model's working memory. If your input exceeds it, older content gets dropped.
Larger context window -> HIGHER cost per call. It's always a tradeoff.
Embeddings enable SEMANTIC search -- find relevant content by MEANING, not just exact keywords.
The exam may ask what tool/service enables semantic search in a knowledge base -> Embeddings + Vector Database.
Tokens != words. A word can be 1-3 tokens depending on length and frequency.
Temperature, Top-K, Top-P affect OUTPUT QUALITY but NOT PRICING.
Low temperature = more DETERMINISTIC output. High temperature = more CREATIVE output.
Input tokens AND output tokens BOTH count toward pricing -- optimize both.
Subword tokenization is more EFFICIENT than word-based -- common in modern LLMs.

Practice Questions

Q1. A company wants to build a search system that returns relevant documents even when the user's search terms don't exactly match the words in the documents. Which GenAI capability enables this?

Tokenization
Fine-Tuning with labeled pairs
Embeddings with a vector database
Guardrails with keyword filters

Answer: C

Embeddings convert text into semantic vectors, and vector databases enable similarity search. Because similar meanings produce similar vectors, the system can find relevant documents even without exact keyword matches -- this is semantic search.

Q2. A developer is using a Bedrock model with a 128,000-token context window for a document analysis task. The document they upload contains 200,000 tokens. What will happen?

The model automatically splits the document and processes it in multiple calls
Content beyond the 128,000-token limit will not be considered by the model in that request
Bedrock automatically increases the context window for large documents
The model will reject the request and return an error

Answer: B

A model's context window is a hard limit on how many tokens it can process at once. Content beyond that limit is simply not considered -- the model has no awareness of it. For very large documents, you would need a model with a larger context window or a RAG approach to selectively retrieve relevant sections.

Q3. A developer wants their GenAI application to produce more creative and varied outputs for a brainstorming tool. Which inference parameter should they increase?

Max Tokens
Temperature
Context Window
Input Token Count

Answer: B

Temperature controls output randomness. Higher temperature (closer to 1) produces more creative, varied, and unpredictable outputs -- ideal for brainstorming. Lower temperature produces more focused, deterministic outputs.

Q4. What is the PRIMARY reason that the same prompt can produce different outputs when sent to an LLM multiple times?

Network latency variations
Token-by-token generation using probabilistic sampling
Model retraining between requests
Random initialization of the context window

Answer: B

LLMs generate output token-by-token, sampling from a probability distribution of possible next tokens. This probabilistic sampling introduces randomness, causing different outputs from the same prompt (unless temperature is set to 0).

Q5. A company is optimizing their Bedrock costs. They notice their prompts include lengthy context that may not always be necessary. What is the MOST direct way to reduce costs?

Switch to a model with a larger context window
Increase the temperature parameter
Reduce input token count by crafting more concise prompts
Enable Provisioned Throughput

Answer: C

Bedrock charges per token for both input and output. Reducing the number of input tokens by crafting more concise, focused prompts directly reduces costs. Longer context windows and Provisioned Throughput would likely increase costs.

Q6. Which tokenization method is MOST efficient for handling rare words and names that may not be in a model's vocabulary?

Word-based tokenization
Character-based tokenization
Subword tokenization
Sentence tokenization

Answer: C

Subword tokenization splits rare/unknown words into meaningful sub-parts (e.g., 'Stephane' -> 'Steph' + 'ane'). This handles out-of-vocabulary words efficiently without requiring a separate token for every possible word or resorting to character-by-character processing.

Amazon Bedrock - Guardrails

What are Guardrails?

Guardrails are a configurable safety layer in Amazon Bedrock that control the interaction between users and Foundation Models. They filter inputs and outputs to ensure responsible, safe, and on-topic AI behavior.

ASCII DIAGRAM: Guardrails Filter Flow

+-------------------------------------------------------------------------------------+
|                    GUARDRAILS FILTER FLOW                                           |
+-------------------------------------------------------------------------------------+

  USER INPUT                   INPUT GUARDRAILS               FOUNDATION MODEL
  ==========                   ================               ================
  +--------------+    +-----------------------------+    +------------------+
  |   User       |    | +=======================+   |    |                  |
  |   Prompt     |--->| |  CONTENT FILTERS      |   |--->|    FM            |
  |              |    | |  * Hate speech?       |   |    |    (Claude,      |
  |              |    | |  * Violence?          |   |    |     Titan, etc.) |
  +--------------+    | |  * Sexual content?    |   |    |                  |
                      | +=======================+   |    +--------+---------+
      If blocked:     | |  DENIED TOPICS        |   |             |
  +--------------+    | |  * Competitor info?   |   |             v
  | "Sorry, I    |<---| |  * Off-topic request? |   |    +------------------+
  |  cannot help |    | +=======================+   |    | Model generates  |
  |  with that." |    | |  WORD FILTERS         |   |    |    response      |
  +--------------+    | |  * Profanity?         |   |    +--------+---------+
                      | |  * Blocked phrases?   |   |             |
                      | +=======================+   |             v
                      | |  PII DETECTION        |   |    +------------------------------+
                      | |  * SSN in input?      |   |    |  OUTPUT GUARDRAILS           |
                      | |  * Credit cards?      |   |    |  +========================+  |
                      | +=======================+   |    |  |  CONTENT FILTERS       |  |
                      +-----------------------------+    |  |  * Harmful content?    |  |
                                                         |  +========================+  |
                                                         |  |  PII MASKING           |  |
                                                         |  |  * Email: [REDACTED]   |  |
                                                         |  |  * Phone: [REDACTED]   |  |
                                                         |  +========================+  |
                                                         |  |  CONTEXTUAL GROUNDING  |  |
                                                         |  |  * Response matches    |  |
                                                         |  |    source facts?       |  |
                                                         |  |  * Reduces hallucin.   |  |
                                                         |  +========================+  |
                                                         +---------------+--------------+
                                                                         |
                                                                         v
                                                         +------------------------------+
                                                         |  SAFE, GROUNDED RESPONSE     |
                                                         |  (PII masked, on-topic,      |
                                                         |   factually grounded)        |
                                                         +------------------------------+

  MONITORING: CloudWatch -> content_filtered_count metric

Why Use Guardrails?

Prevent your model from generating harmful, offensive, or inappropriate content
Enforce topic restrictions (e.g., a customer service bot should only answer product questions)
Protect user privacy by removing personally identifiable information (PII)
Reduce hallucinations (model inventing facts that aren't true)
Meet compliance and governance requirements
Monitor and analyze guardrail violations for ongoing tuning

What Guardrails Can Configure:

1. Content Filters:

Control the strength of filtering for harmful content categories:

Hate speech
Insults
Sexual content
Violence
Misconduct

Filter strength is adjustable -- you choose how aggressively to block each category.

2. Denied Topics:

Define topics your model should NEVER discuss.

Provide a topic name, definition, and optional example phrases
Example: Block all food recipes so a healthcare chatbot stays on-topic
When triggered: user receives a customizable blocked message (e.g., 'Sorry, this model cannot answer this question')

3. Word Filters:

Block specific words or phrases (profanity, competitor names, etc.)
Upload custom word/phrase lists

4. Sensitive Information Filters (PII Masking):

Automatically detect and MASK PII in model responses
Supported PII types: email addresses, phone numbers, SSNs, credit card numbers, and more
Also supports custom regex patterns for domain-specific sensitive data
Masking keeps responses useful while protecting privacy

5. Contextual Grounding:

Reduces hallucinations by verifying that model responses are grounded in provided context
Two checks: Grounding (response matches the source) and Relevance (response answers the question)
Critical for RAG-based applications

How Guardrails Work in Practice:

Applied to BOTH input (user prompt) and output (model response)
Multiple guardrails can be stacked on a single model
Applied in the Bedrock playground and via API in production applications
Violations are logged and can trigger CloudWatch alarms

Monitoring Guardrails with CloudWatch:

content_filtered_count metric tracks how often guardrails block content
Build alarms to alert when blocking rates spike (could indicate prompt injection attempts or policy gaps)

Prompt Injection Defense:

Guardrails help defend against prompt injection -- malicious attempts to override model instructions. Content filters and denied topics can block manipulative prompts.

Key Terms

Term	Definition
Guardrails (Bedrock)	A configurable safety layer in Amazon Bedrock that filters inputs and outputs to block harmful content, restrict topics, remove PII, and reduce hallucinations.
PII Masking	A guardrail feature that automatically detects and redacts personally identifiable information (emails, phone numbers, SSNs, etc.) from model responses.
Denied Topics	A guardrail configuration that prevents the model from engaging with specific subject areas, returning a customizable blocked message instead.
Contextual Grounding	A guardrail feature that checks whether model responses are factually grounded in the provided context, reducing the risk of hallucinations.
Hallucination	When an AI model generates confident-sounding but factually incorrect or fabricated information. Guardrails and RAG grounding help reduce this risk.
content_filtered_count	A CloudWatch metric from Amazon Bedrock that tracks how many model invocations were blocked or modified by guardrails.
Content Filters	Guardrail settings that control filtering of harmful content categories like hate speech, violence, sexual content, and insults. Filter strength is adjustable.
Word Filters	A guardrail feature that blocks specific words or phrases from appearing in model inputs or outputs. Supports custom word lists.
Prompt Injection	A security attack where a malicious user crafts input designed to trick the model into ignoring its instructions or revealing sensitive information.
Filter Strength	The configurable intensity level (e.g., low, medium, high) for content filters. Higher strength blocks more content but may increase false positives.
Blocked Message	The customizable response returned to users when guardrails block their request (e.g., 'I cannot help with that topic').

Exam Tips:

Guardrails apply to BOTH the input (user prompt) AND the output (model response).
PII masking = automatically REDACTS sensitive personal info from responses, but keeps the rest of the response intact.
Denied topics = completely BLOCK a subject area. Content filters = REDUCE harmful content by category and severity.
Multiple guardrails can be STACKED -- they work together, not instead of each other.
Contextual grounding specifically reduces HALLUCINATIONS -- a frequent exam topic.
Guardrail violations are monitored via CloudWatch using the content_filtered_count metric.
Guardrails can help defend against PROMPT INJECTION attacks.
PII masking supports CUSTOM REGEX patterns for domain-specific sensitive data.
Content filter STRENGTH is adjustable -- higher strength = more aggressive blocking.
Guardrails are applied in BOTH the playground AND production API calls.

Practice Questions

Q1. A company builds a legal research chatbot on Amazon Bedrock. They want to ensure the chatbot never discusses competitor legal services and automatically removes any client email addresses from responses. Which Bedrock features should they configure?

Content Filters and Contextual Grounding
Denied Topics and Sensitive Information Filters (PII Masking)
Word Filters and Model Evaluation
Fine-Tuning and Provisioned Throughput

Answer: B

Denied Topics prevents the model from engaging with specific subjects (competitor services). Sensitive Information Filters with PII masking automatically detects and redacts email addresses from model outputs. Both are guardrail features.

Q2. A RAG-based customer service bot sometimes generates responses that sound confident but are not supported by the company's knowledge base documents. Which Guardrail feature is MOST effective at reducing this problem?

Content Filters set to maximum strength
Denied Topics for off-topic questions
Contextual Grounding checks
Word Filters blocking unverified claims

Answer: C

Contextual Grounding verifies that the model's response is factually supported by the retrieved source documents. It directly addresses hallucinations in RAG applications by checking both grounding (response matches sources) and relevance (response answers the question).

Q3. A security team notices a spike in the content_filtered_count CloudWatch metric for their Bedrock application. What might this indicate?

The model is running out of context window capacity
Users may be attempting prompt injection or sending inappropriate content
Provisioned Throughput is insufficient
The model needs to be fine-tuned

Answer: B

A spike in content_filtered_count means guardrails are blocking more content than usual. This could indicate prompt injection attempts, users testing system boundaries, or a surge in inappropriate requests. The security team should investigate the blocked inputs.

Q4. A healthcare company wants to ensure their AI assistant never discusses non-medical topics like politics or entertainment. Which guardrail configuration is MOST appropriate?

Content Filters for violence and hate speech
PII Masking for patient data
Denied Topics for politics and entertainment
Word Filters for political keywords

Answer: C

Denied Topics is designed to block entire subject areas. By defining 'politics' and 'entertainment' as denied topics with appropriate definitions, the guardrail will refuse to engage with those subjects entirely.

Q5. A financial services company needs their AI to detect and redact a custom internal account number format (e.g., 'ACC-######-##'). Which guardrail capability supports this?

Word Filters with the account prefix
PII Masking with custom regex patterns
Content Filters set to high strength
Denied Topics for account-related queries

Answer: B

PII Masking supports custom regex patterns in addition to standard PII types. The company can define a regex pattern matching their account number format (ACC-######-##) to automatically detect and redact these values.

Q6. A chatbot using Amazon Bedrock Guardrails blocks a user request and returns: 'I apologize, but I cannot assist with that topic.' What guardrail feature was likely triggered?

PII Masking
Contextual Grounding
Denied Topics
Content Filters at low strength

Answer: C

Denied Topics returns a customizable blocked message when users request information about prohibited subjects. The apologetic refusal message is typical of a denied topic trigger. PII masking redacts content rather than blocking, and content filters reduce harmful content without typically returning an explicit refusal.

Amazon Bedrock - Agents

What are Bedrock Agents?

Agents are intelligent orchestrators built into Amazon Bedrock that enable Foundation Models to autonomously plan and execute multi-step tasks -- going beyond simple Q&A to actually DOING things within your systems.

ASCII DIAGRAM: Agent Workflow

+-------------------------------------------------------------------------------------+
|                    BEDROCK AGENT WORKFLOW                                           |
+-------------------------------------------------------------------------------------+

  USER REQUEST
  ============
  "Book me a flight to NYC next week and add it to my calendar"
                |
                v
  +---------------------------------------------------------------------------------+
  |                         BEDROCK AGENT                                            |
  |  +--------------------------------------------------------------------------+   |
  |  |  Step 1: UNDERSTAND THE TASK                                              |   |
  |  |  Agent sends to FM: task + available actions + knowledge bases            |   |
  |  +--------------------------------------------------------------------------+   |
  |                                        |                                        |
  |                                        v                                        |
  |  +--------------------------------------------------------------------------+   |
  |  |  Step 2: CHAIN-OF-THOUGHT REASONING (FM Plans Steps)                      |   |
  |  |  +---------------------------------------------------------------------+  |   |
  |  |  | 1. Get user preferences from profile                                |  |   |
  |  |  | 2. Search available flights to NYC for next week                    |  |   |
  |  |  | 3. Book best matching flight                                        |  |   |
  |  |  | 4. Create calendar event with flight details                        |  |   |
  |  |  | 5. Confirm with user                                                |  |   |
  |  |  +---------------------------------------------------------------------+  |   |
  |  +--------------------------------------------------------------------------+   |
  |                                        |                                        |
  |                                        v                                        |
  |  +--------------------------------------------------------------------------+   |
  |  |  Step 3: EXECUTE EACH ACTION                                              |   |
  |  |                                                                           |   |
  |  |   +----------------+    +----------------+    +----------------+         |   |
  |  |   | ACTION GROUP 1 |    | ACTION GROUP 2 |    |  KNOWLEDGE     |         |   |
  |  |   | (Lambda)       |    | (REST API)     |    |   BASE         |         |   |
  |  |   |                |    |                |    |                |         |   |
  |  |   | get_profile()  |    | book_flight()  |    | Search flight  |         |   |
  |  |   | add_calendar() |    |                |    | policies       |         |   |
  |  |   +-------+--------+    +-------+--------+    +-------+--------+         |   |
  |  |           |                     |                     |                  |   |
  |  |           +----------+----------+----------+----------+                  |   |
  |  |                      |                     |                              |   |
  |  |                      v                     v                              |   |
  |  |           +-------------------------------------------+                  |   |
  |  |           | Results from each step feed into next     |                  |   |
  |  |           +-------------------------------------------+                  |   |
  |  +--------------------------------------------------------------------------+   |
  |                                        |                                        |
  |                                        v                                        |
  |  +--------------------------------------------------------------------------+   |
  |  |  Step 4: SYNTHESIZE FINAL RESPONSE                                        |   |
  |  |  FM combines all action results into coherent response                    |   |
  |  +--------------------------------------------------------------------------+   |
  +---------------------------------------------------------------------------------+
                |
                v
  +---------------------------------------------------------------------------------+
  |  FINAL RESPONSE TO USER                                                         |
  |  "I've booked your flight to NYC on March 8th at 9:00 AM for $349. The         |
  |   confirmation number is ABC123 and I've added it to your calendar."           |
  +---------------------------------------------------------------------------------+

  DEBUGGING: Agent Tracing shows every step, every API call, every decision

Agent vs. Basic Model:

Basic model: receives a prompt, generates a response, done
Agent: receives a task, reasons about HOW to accomplish it, executes a sequence of actions using APIs and tools, and returns a final result

What Agents Can Do:

Query databases and APIs on your behalf
Execute AWS Lambda functions (write to databases, trigger workflows)
Search Knowledge Bases (RAG) for relevant information
Plan a sequence of steps autonomously using chain-of-thought reasoning
Handle multi-turn conversations while maintaining context
Create, deploy, or modify infrastructure and application components

How Bedrock Agents Work -- Behind the Scenes:

User submits a task to the agent
Agent sends the task + available actions + knowledge bases + conversation history to a Foundation Model
The FM uses chain-of-thought reasoning to generate an ordered list of steps
The agent executes each step: calling APIs, running Lambda functions, or querying knowledge bases
Results from each step feed into the next
After all steps complete, the FM synthesizes all results into a final, coherent response
Agent returns the final response to the user

Chain-of-Thought:

The process where the FM generates a logical step-by-step plan before executing actions. Makes agent behavior predictable, debuggable, and auditable.

Action Groups -- Defining What Agents Can Do:

Action groups are the tools available to an agent. Each group describes:

What actions exist (e.g., get_order_history, place_order, get_shipping_policy)
What inputs each action expects
How to call each action (API endpoint via OpenAPI schema OR Lambda function)

Two Ways to Define Actions:

OpenAPI Schema - define REST API endpoints the agent can call
AWS Lambda Functions - run custom code for any action (database writes, external API calls, business logic)

Tracing:

Bedrock provides an agent tracing feature that shows every step the agent took -- which APIs it called, what data it retrieved, how it reasoned -- making debugging straightforward.

Real-World Example -- E-Commerce Agent:

A shopping agent configured with:

Knowledge base: product catalog, return policy, shipping FAQ
Actions: get_purchase_history, get_recommendations, add_to_cart, place_order

User: 'What size jacket should I order based on my previous purchases, and add the recommended one to my cart.'

Agent plan: (1) query order history -> (2) determine typical size -> (3) search product catalog for matching jackets -> (4) add recommended jacket to cart -> (5) confirm with user.

Agent Memory:

Agents can maintain conversation context across multiple turns, remembering previous questions and answers. This enables natural back-and-forth conversations rather than isolated Q&A.

Key Terms

Term	Definition
Bedrock Agent	An intelligent orchestrator in Amazon Bedrock that autonomously plans and executes multi-step tasks by reasoning, calling APIs, running Lambda functions, and querying knowledge bases.
Action Group	A set of actions (API calls or Lambda functions) that a Bedrock agent is configured to use when executing tasks. Defines what the agent can DO in your systems.
Chain-of-Thought Reasoning	The process by which a Foundation Model generates an explicit step-by-step plan before acting -- making agent behavior more logical, predictable, and debuggable.
Agent Tracing	A Bedrock feature that records and displays every step an agent took -- which actions it called, what data it retrieved -- enabling full transparency and debugging.
OpenAPI Schema	A standardized format for describing REST API endpoints. Bedrock agents use OpenAPI schemas to understand how to call external APIs as part of their action groups.
Agent Memory	The ability of a Bedrock agent to maintain conversation context across multiple turns, enabling natural multi-turn conversations.
Multi-Step Task	A complex request that requires multiple sequential actions to complete, such as 'Book a flight and add it to my calendar.'
Orchestration	The process of coordinating multiple components (APIs, databases, models) to accomplish a task. Agents handle orchestration automatically.
Tool Use	The capability of an AI agent to call external tools (APIs, functions) as part of completing a task, rather than just generating text.
Session State	The context and variables maintained by an agent throughout a conversation, enabling it to reference previous interactions.

Exam Tips:

Agents = AUTONOMY + ACTION. They don't just answer -- they execute tasks across multiple systems.
Chain-of-thought = the FM generates a PLAN of steps before executing. This is what makes agents smart.
Action groups can call EXTERNAL APIs or AWS Lambda functions -- both are valid.
Agents can use both ACTIONS (to do things) and KNOWLEDGE BASES (to look up information) in the same workflow.
Agent TRACING lets you debug step-by-step -- it shows exactly what the agent did and why.
If an exam question describes 'autonomously completing multi-step tasks using APIs' -> the answer is Bedrock Agents.
Agents maintain MEMORY across conversation turns -- they remember context.
OpenAPI schemas define HOW agents call external REST APIs.
Lambda functions in action groups can execute ANY custom code -- database writes, calculations, third-party integrations.
Agent orchestration is AUTOMATIC -- you define what actions are available, the agent figures out when to use them.

Practice Questions

Q1. A travel company wants to build an AI assistant that can search flight availability, book tickets, send confirmation emails, and update the customer's travel profile -- all in a single conversational interaction. Which Amazon Bedrock feature enables this?

Knowledge Bases with RAG
Model Fine-Tuning with Supervised Learning
Bedrock Agents with Action Groups
Guardrails with Denied Topics

Answer: C

Bedrock Agents autonomously plan and execute multi-step workflows. Action groups define the specific capabilities (search flights, book tickets, send emails, update profiles) that the agent can invoke. This is exactly the multi-step autonomous task execution use case for agents.

Q2. A developer wants to understand exactly what steps a Bedrock Agent took to fulfill a user's complex request, including which APIs were called and in what order. Which Bedrock feature provides this visibility?

CloudWatch Metrics
Guardrail violation logs
Agent Tracing
Model Evaluation with a judge model

Answer: C

Agent Tracing records the complete execution path of an agent -- every action called, every knowledge base query, and the reasoning at each step. It is the primary debugging tool for Bedrock Agents.

Q3. What is the PRIMARY difference between a basic Foundation Model invocation and using a Bedrock Agent?

Agents are cheaper than direct model calls
Agents can autonomously plan and execute multi-step tasks using external tools
Agents generate more accurate text than base models
Agents require less prompt engineering

Answer: B

The key difference is autonomy and action. Basic model calls generate text responses. Agents go further -- they reason about how to accomplish a task, create a plan, execute actions using APIs and Lambda functions, and synthesize results.

Q4. An e-commerce company wants their agent to be able to query product inventory and process returns. How should they configure these capabilities?

Fine-tune the model with inventory and returns data
Create action groups with Lambda functions or API definitions for each capability
Add inventory and returns data to a RAG knowledge base
Configure guardrails to allow inventory and returns topics

Answer: B

Action groups define what actions an agent can perform. For querying inventory (read) and processing returns (write), the company should create action groups that connect to Lambda functions or APIs that perform these operations.

Q5. A user asks a Bedrock Agent: 'Order the same coffee I got last week and deliver to my home.' The agent needs to: (1) look up last week's order, (2) get the user's home address, (3) place the order. What capability makes this multi-step reasoning possible?

RAG retrieval from order history
Chain-of-thought reasoning by the Foundation Model
Guardrails contextual grounding
Provisioned Throughput for fast responses

Answer: B

Chain-of-thought reasoning allows the Foundation Model to generate a logical multi-step plan before executing. The FM identifies the sequence of steps needed (lookup order, get address, place order) and the agent executes them in order.

Q6. A Bedrock Agent needs to call a third-party payment processing REST API. Which action group configuration method should the developer use?

Create a Lambda function that calls the API
Define the API using an OpenAPI schema
Add the API documentation to a knowledge base
Fine-tune the model with API examples

Answer: B

OpenAPI schemas are specifically designed to describe REST APIs in a standardized format. For external REST APIs like payment processors, defining the API using an OpenAPI schema allows the agent to call it directly. Lambda functions are better for custom logic or non-REST integrations.

Amazon Bedrock - CloudWatch Integration

Why Integrate Bedrock with CloudWatch?

Amazon Bedrock integrates with Amazon CloudWatch to provide full observability of your GenAI workloads -- logging every interaction, tracking performance metrics, and enabling alerting on critical thresholds.

Two Integration Points:

1. Model Invocation Logging (CloudWatch Logs)

Captures detailed records of every model invocation -- both inputs and outputs -- and sends them to CloudWatch Logs or Amazon S3.

What is Logged:

Input text (user prompt)
Output text (model response)
Images and embeddings (optional)
Model ID used
Region
Token counts (input, output, total)
Latency (response time in milliseconds)
Timestamps and invocation metadata

How to Enable:

Go to Bedrock Settings -> Model Invocation Logging
Choose destination: CloudWatch Logs, Amazon S3, or both
Specify a CloudWatch Log Group (must exist first -- create in CloudWatch if needed)
Assign an IAM service role with permission to write to CloudWatch Logs

Use Cases for Invocation Logs:

Debug slow or incorrect model responses
Audit all AI interactions for compliance
Analyze which models are being used most frequently
Build custom dashboards on historical usage patterns
Use CloudWatch Logs Insights for real-time log analysis and querying

2. CloudWatch Metrics

Bedrock automatically publishes operational metrics to CloudWatch that you can graph, dashboard, and alarm on.

Key Bedrock Metrics:

Metric	What It Measures
Invocation count	Total number of model calls
Invocation latency	Response time per model call
Input token count	Tokens consumed as input
Output token count	Tokens generated as output
content_filtered_count	How often guardrails blocked/modified content

Building Alarms on Bedrock Metrics:

Example alarms you can set:

Alert if invocation latency exceeds 5 seconds (degraded user experience)
Alert if content_filtered_count spikes (possible prompt injection attack)
Alert if token usage exceeds budget thresholds

Important: IAM Role Requirement

Bedrock needs an IAM service role with permission to write logs to CloudWatch Logs and/or S3. This role must be created and specified when enabling invocation logging.

CloudTrail Integration:

In addition to CloudWatch, Bedrock integrates with AWS CloudTrail for API-level auditing:

Tracks WHO made API calls (IAM identity)
Records WHAT API calls were made (CreateAgent, InvokeModel, etc.)
Logs WHEN and WHERE calls originated
Does NOT log prompt/response content -- use CloudWatch Logs for that

Cost Monitoring:

Use CloudWatch with AWS Cost Explorer to:

Track Bedrock spending by model
Set budget alerts when costs approach limits
Identify which applications consume the most tokens

Key Terms

Term	Definition
Model Invocation Logging	A Bedrock feature that captures all model inputs, outputs, token counts, and latency data and sends them to CloudWatch Logs or Amazon S3 for auditing and debugging.
CloudWatch Logs Insights	An AWS service for querying and analyzing log data in CloudWatch Logs in near real-time. Can be used to analyze Bedrock invocation logs for patterns and issues.
Invocation Latency	The time between sending a prompt to a Bedrock model and receiving a complete response. Tracked as a CloudWatch metric; high latency signals performance issues.
content_filtered_count	A Bedrock CloudWatch metric that counts how many model invocations were blocked or modified by a guardrail. Useful for monitoring responsible AI compliance.
CloudWatch Alarm	An automated notification triggered when a CloudWatch metric crosses a defined threshold (e.g., latency too high, guardrail blocks too frequent).
IAM Service Role	An AWS IAM role assumed by AWS services (like Bedrock) to perform actions on your behalf. Required for Bedrock to write logs to CloudWatch.
AWS CloudTrail	A service that logs API calls made in your AWS account. For Bedrock, it records who called which APIs and when -- but not prompt/response content.
CloudWatch Log Group	A container for CloudWatch log streams. Must be created before enabling Bedrock invocation logging.
Invocation Count	A CloudWatch metric tracking the total number of model calls made to Bedrock. Useful for usage monitoring and capacity planning.

Exam Tips:

Model Invocation Logging sends to CLOUDWATCH LOGS or S3 -- not to CloudWatch Metrics (those are separate).
You must CREATE the CloudWatch Log Group BEFORE enabling invocation logging -- Bedrock won't create it automatically.
An IAM service role is REQUIRED for Bedrock to write logs -- it must have CloudWatch Logs write permissions.
content_filtered_count metric = monitor guardrail activity. High count could mean prompt injection attempts.
Invocation logs include: prompt text, response text, model ID, token counts, and latency.
CloudWatch Logs Insights can be used to ANALYZE invocation logs in real time -- useful for debugging.
CloudTrail logs WHO called APIs but NOT the prompt/response content -- use CloudWatch Logs for content.
Set CloudWatch ALARMS for latency spikes, error rates, and guardrail blocks to catch issues early.
Token counts in logs help with COST ANALYSIS -- identify expensive prompts and applications.

Practice Questions

Q1. A compliance team requires a full audit trail of every prompt sent to and every response received from their Amazon Bedrock models. Which feature should be enabled?

CloudWatch Metrics for Bedrock
Bedrock Guardrails with content filters
Model Invocation Logging to CloudWatch Logs or S3
Agent Tracing for all model calls

Answer: C

Model Invocation Logging captures the complete input and output of every Bedrock model call and persists it to CloudWatch Logs or S3. This is the appropriate feature for compliance audit trail requirements.

Q2. An operations team wants to be alerted automatically when the response latency of their Amazon Bedrock model exceeds 4 seconds. Which AWS service combination achieves this?

AWS Config Rule monitoring Bedrock API calls
Bedrock Guardrails with a latency threshold
CloudWatch Metric for invocation latency + CloudWatch Alarm
AWS Trusted Advisor latency recommendations

Answer: C

Bedrock publishes invocation latency as a CloudWatch Metric. You can create a CloudWatch Alarm that triggers when this metric exceeds your 4-second threshold, sending a notification via SNS. This is the standard AWS pattern for metric-based alerting.

Q3. A developer enables Model Invocation Logging for Amazon Bedrock but receives an error that the CloudWatch Log Group doesn't exist. What should they do?

Wait for Bedrock to automatically create the log group
Create the CloudWatch Log Group manually before enabling logging
Use S3 logging instead -- CloudWatch Logs is not supported
Update the Bedrock service-linked role

Answer: B

Bedrock does not automatically create CloudWatch Log Groups. The log group must be created manually in CloudWatch before enabling invocation logging in Bedrock settings.

Q4. A security auditor wants to know which IAM user called the CreateAgent API in Amazon Bedrock last week. Which AWS service provides this information?

CloudWatch Logs
CloudWatch Metrics
AWS CloudTrail
Model Invocation Logging

Answer: C

AWS CloudTrail logs all API calls made in your AWS account, including who made the call (IAM identity), what API was called, and when. For API-level auditing (like CreateAgent), CloudTrail is the correct service.

Q5. A company wants to analyze their Bedrock usage to identify which application sends the most expensive prompts. Which combination of tools should they use?

Guardrails and Denied Topics
Model Invocation Logging with CloudWatch Logs Insights
Model Evaluation with benchmark datasets
Provisioned Throughput monitoring

Answer: B

Model Invocation Logging captures token counts for every call. CloudWatch Logs Insights can query this data to analyze patterns, identify applications with high token usage, and calculate costs per application or prompt type.

Amazon Bedrock - Pricing

Amazon Bedrock Pricing Models:

1. On-Demand (Pay-As-You-Go)

No upfront commitment; charged only for what you use
Text models: charged per 1,000 input tokens + per 1,000 output tokens processed
Embeddings models: charged per 1,000 input tokens
Image models: charged per image generated
Best for: unpredictable or variable workloads, development, and testing
Works with BASE models only (not fine-tuned or custom models)

2. Batch Mode

Submit multiple inference requests together as a batch job
Results delivered to Amazon S3 (not in real-time)
Discount: up to 50% cheaper than on-demand pricing
Best for: non-time-sensitive, high-volume processing (e.g., batch summarization, classification)
Trade-off: not real-time -- responses arrive later

3. Provisioned Throughput

Reserve a guaranteed level of capacity (model units) for a fixed period (1 month or 6 months)
Guarantees: maximum input + output tokens per minute
Required for: fine-tuned models, custom models, and imported models (cannot use on-demand)
NOT primarily a cost-saving measure -- purpose is PERFORMANCE and CAPACITY GUARANTEE
Best for: production workloads requiring consistent, predictable throughput

Pricing by Improvement Technique:

Technique	Cost	Reasoning
Prompt Engineering	Very low	No training; just craft better prompts
RAG	Low-Medium	No model change; vector DB + search costs
Instruction-Based Fine-Tuning	Medium	Some additional computation; labeled data prep
Full Domain Fine-Tuning	High	Unlabeled data at scale + intensive GPU compute

Cost Optimization Strategies:

Use Batch Mode - up to 50% savings for non-real-time tasks
Choose smaller models - less capable but much cheaper; test if accuracy is sufficient
Optimize token usage - shorter, more efficient prompts; request concise outputs
Prompt Engineering first - cheapest improvement technique; no extra infrastructure
Avoid Provisioned Throughput for cost savings - use it for performance, not savings
Temperature, Top K, Top P settings - change model behavior but do NOT affect pricing

Key Cost Driver:

The PRIMARY driver of Bedrock cost is the number of input AND output tokens. Shorter prompts and concise responses directly reduce costs.

Additional Cost Considerations:

Model selection matters - Claude costs more per token than Titan; Llama is competitively priced
Knowledge Bases - additional costs for vector database (OpenSearch), S3 storage, embedding model calls
Agents - charged for model invocations + any Lambda/API calls made by action groups
Guardrails - included in Bedrock pricing; no separate charge
Fine-tuning jobs - charged for training compute time + storage of custom model

Pricing Comparison Example:

Scenario	Best Pricing Model
Testing a new model	On-Demand
Processing 1M documents overnight	Batch Mode
Production chatbot with SLA	Provisioned Throughput
Fine-tuned customer service model	Provisioned Throughput (required)

Key Terms

Term	Definition
On-Demand Pricing	Bedrock's pay-as-you-go model where you are charged per token processed or image generated with no upfront commitment. Available for base models only.
Batch Mode	A Bedrock pricing option where multiple inference requests are grouped and processed together, delivering results to S3 with up to 50% cost savings compared to on-demand.
Provisioned Throughput	A Bedrock capacity reservation where you commit to paying monthly for a guaranteed maximum token throughput. Required for fine-tuned or custom models; focused on performance, not cost savings.
Prompt Engineering	The practice of carefully crafting input prompts to improve model output quality. The cheapest improvement technique -- requires no model training or infrastructure changes.
Model Units	The unit of capacity reserved in Provisioned Throughput. Each model unit guarantees a specific maximum number of tokens per minute for your model invocations.
Input Tokens	The tokens in your prompt sent to the model. Charged separately from output tokens; part of the primary cost driver.
Output Tokens	The tokens generated by the model in its response. Typically more expensive than input tokens for most models.
Token Cost	The per-token pricing charged by Bedrock. Varies significantly by model -- larger, more capable models cost more per token.
Commitment Period	For Provisioned Throughput, the time you commit to paying (1 month or 6 months). Longer commitments may offer better pricing.

Exam Tips:

Batch Mode = up to 50% cheaper, but results are NOT real-time. Trade-off: latency vs. cost.
Provisioned Throughput is for PERFORMANCE and CAPACITY GUARANTEE -- not primarily for cost savings.
Fine-tuned models CANNOT use on-demand pricing -- they REQUIRE Provisioned Throughput.
Prompt Engineering has ZERO additional infrastructure cost -- it is purely crafting better prompts.
Temperature, Top K, Top P = change output behavior but do NOT change the pricing.
The PRIMARY cost driver = number of tokens (input + output). Shorter prompts = lower cost.
Smaller models = cheaper + faster + less accurate. Always test before assuming accuracy is insufficient.
Batch Mode delivers results to S3 -- not synchronous responses.
Knowledge Bases add SEPARATE costs for vector DB, S3, and embedding model calls.
Output tokens are typically MORE EXPENSIVE than input tokens -- optimize response length too.

Practice Questions

Q1. A data analytics company needs to summarize 100,000 customer reviews overnight. Cost is the primary concern and they don't need real-time results. Which Bedrock pricing model should they use?

Provisioned Throughput for guaranteed capacity
On-Demand for flexible pay-per-use
Batch Mode for up to 50% cost savings
Free Tier for the first 12 months

Answer: C

Batch Mode is designed exactly for this scenario -- high-volume, non-real-time processing at up to 50% savings versus on-demand. Results are delivered to S3 after processing, which is acceptable for an overnight batch job.

Q2. A startup wants to improve the quality of their Bedrock model's outputs as cheaply as possible. They don't have budget for model training. Which approach should they try FIRST?

Fine-Tune the model with supervised learning
Use Provisioned Throughput for better performance
Apply Prompt Engineering techniques
Enable RAG with an OpenSearch knowledge base

Answer: C

Prompt Engineering requires zero additional infrastructure or model training -- it's purely crafting better input prompts. It is by far the cheapest improvement technique and should always be tried first before investing in RAG, fine-tuning, or infrastructure changes.

Q3. A company has fine-tuned a Bedrock model for their customer service chatbot. They try to invoke it using the on-demand pricing model but receive an error. What is the reason?

Fine-tuned models can only be used in the Bedrock playground, not via API
Fine-tuned models are not supported in the AWS region they selected
Fine-tuned models require Provisioned Throughput and cannot use on-demand pricing
On-demand pricing is only available for image models, not text models

Answer: C

On-demand pricing in Bedrock works only with base (unmodified) Foundation Models. Fine-tuned, custom, and imported models must be deployed using Provisioned Throughput, where you commit to a monthly capacity reservation.

Q4. An architect is designing a Bedrock solution and wants to minimize costs. Which of the following affects Bedrock pricing?

Temperature and Top-P parameter settings
The number of input and output tokens processed
The time of day when requests are made
The AWS region where users are located

Answer: B

Bedrock pricing is primarily driven by the number of tokens processed (both input and output). Temperature, Top-P, and Top-K parameters affect output quality and randomness but do not change pricing. Time of day and user location don't affect token pricing.

Q5. A company implements a RAG-based application on Bedrock. Which of the following are ADDITIONAL costs beyond basic model invocation? (Select the best answer)

Guardrails and content filtering
Vector database (OpenSearch), S3 storage, and embedding model calls
CloudWatch metrics collection
Model catalog browsing

Answer: B

RAG requires additional infrastructure: a vector database like OpenSearch (for storing embeddings), S3 (for source documents), and embedding model calls (to vectorize documents and queries). These add to the base model invocation costs. Guardrails are included in Bedrock pricing.

Q6. A production application requires guaranteed capacity of 100,000 tokens per minute with no throttling. Which Bedrock pricing model provides this guarantee?

On-Demand with high request rate
Batch Mode with priority processing
Provisioned Throughput with sufficient model units
Multiple on-demand calls in parallel

Answer: C

Provisioned Throughput reserves guaranteed capacity measured in model units. Each model unit provides a specific maximum tokens per minute. This is the only Bedrock pricing model that provides capacity guarantees -- on-demand offers no throughput guarantees.

Amazon Nova - AWS's Foundation Model Family

What is Amazon Nova?

Amazon Nova is AWS's own family of Foundation Models, available through Amazon Bedrock. Designed to be fast, cost-effective, and enterprise-ready -- competing directly with models from OpenAI, Anthropic, and others.

Amazon Nova Model Tiers (Nova 1 Family):

Model	Type	Capability	Notes
Nova Premier	Multimodal	Most capable; complex reasoning + best teacher for distillation	Highest accuracy, highest cost
Nova Pro	Multimodal	Best balance of accuracy, speed, and cost for wide range of tasks	Strong all-rounder
Nova Lite	Multimodal	Low-cost, lightning fast for image, video, and text inputs	Speed-optimized
Nova Micro	Text only	Lowest latency and lowest cost; text only	No image/video support
Nova Canvas	Image generation	State-of-the-art image generation	Text-to-image only
Nova Reel	Video generation	State-of-the-art video generation	Text-to-video or image-to-video
Nova Sonic	Speech	Conversational speech understanding and generation; multilingual	Voice/audio focused

Amazon Nova 2 Family (Enhanced Capabilities):

Model	Type	Use Cases
Nova 2 Lite	Multimodal (text, images, video, docs)	Fast, cost-effective reasoning for everyday workloads
Nova Sonic	Speech	Speech understanding and generation
Nova 2 Multimodal Embeddings	Embeddings	RAG use cases requiring multimodal vector search
Nova 2 Omni	All-in-one multimodal	Multimodal reasoning AND image generation combined

Key Differentiators for Nova 2:

Up to 1 million token context window
Advanced reasoning capabilities
Suitable for interactive chatbots, document/video analysis, and AI agents

Quick Reference -- Match Model to Use Case:

Text + image + video understanding -> Nova Pro, Nova Lite, or Nova Premier
Text ONLY, fastest/cheapest -> Nova Micro
Generate IMAGES -> Nova Canvas
Generate VIDEO -> Nova Reel
SPEECH / voice interactions -> Nova Sonic
RAG with multimodal data -> Nova 2 Multimodal Embeddings
Everything in one model -> Nova 2 Omni

Distillation with Nova Premier:

Nova Premier is explicitly designed as the BEST TEACHER model for distillation -- use it to train smaller, cheaper student models that inherit its reasoning quality.

Nova vs. Third-Party Models:

Consideration	Amazon Nova	Third-Party (Claude, Llama)
Provider	AWS-built, AWS-supported	External providers
Integration	Deep AWS integration	Standard Bedrock API
Pricing	Competitive, AWS-optimized	Varies by provider
Data residency	AWS native	Stays in your account
Image/Video generation	Canvas, Reel	Stability AI, others

Nova for Agents:

Nova models are well-suited for Bedrock Agents due to their strong reasoning capabilities. Nova Pro and Nova Premier can plan and execute complex multi-step tasks effectively.

Key Terms

Term	Definition
Amazon Nova	AWS's own family of Foundation Models available on Bedrock. Includes models for text, images, video, speech, and embeddings -- designed for enterprise use with speed and cost-effectiveness.
Nova Premier	The most capable Amazon Nova model. Best for complex reasoning tasks and as the teacher model in distillation workflows.
Nova Micro	The smallest, fastest, cheapest Amazon Nova model. Text-only; no image or video support. Best for high-volume, low-latency text tasks.
Nova Canvas	Amazon Nova's image generation model. Converts text prompts into images.
Nova Reel	Amazon Nova's video generation model. Converts text or images into video clips.
Nova Sonic	Amazon Nova's speech model. Handles conversational speech understanding and generation in multiple languages.
Nova 2 Omni	An all-in-one Amazon Nova model that combines multimodal reasoning (text, image, video, documents) with image generation capability.
Nova Pro	Amazon Nova's balanced model offering the best combination of accuracy, speed, and cost for general-purpose tasks.
Nova Lite	A fast, low-cost Nova model that accepts text, images, and video input. Optimized for speed over maximum capability.
Multimodal Embeddings	Embeddings that can represent multiple data types (text, images) in the same vector space, enabling search across different content types.

Exam Tips:

Canvas = IMAGES. Reel = VIDEO. Sonic = SPEECH/AUDIO. Know these three clearly -- common exam question.
Nova Micro = TEXT ONLY + cheapest + lowest latency. No image/video input support.
Nova Premier is the recommended TEACHER model for distillation workflows.
Nova Pro = best BALANCE of accuracy, speed, and cost -- the default 'general purpose' choice.
Nova 2 Omni = ALL-IN-ONE -- handles everything (text, images, video, documents, image generation) in a single model.
Nova 2 context window = up to 1 MILLION tokens -- much larger than Nova 1 models.
Amazon Nova is AWS's OWN model family -- use this when asked about AWS-native GenAI models.
Nova Multimodal Embeddings enable RAG across different content types (text + images).
For high-volume, cost-sensitive TEXT workloads, Nova Micro is the best choice.
Nova models integrate DEEPLY with other AWS services -- native advantage over third-party models.

Practice Questions

Q1. A developer needs to build an application that generates short promotional video clips from product images and text descriptions. Which Amazon Nova model should they use?

Nova Canvas
Nova Micro
Nova Reel
Nova Sonic

Answer: C

Nova Reel is Amazon Nova's video generation model. It accepts text and/or image inputs and generates video output -- exactly matching this use case of creating video from product images and descriptions.

Q2. A cost-conscious team needs to process millions of text-only customer feedback messages per day using Amazon Nova, with the lowest possible cost and latency. Which model is MOST appropriate?

Nova Premier
Nova Pro
Nova Lite
Nova Micro

Answer: D

Nova Micro is the text-only model with the lowest latency and lowest cost in the Nova family. Since the use case is text-only (customer feedback) and cost/speed are the primary requirements, Nova Micro is the optimal choice.

Q3. A company is implementing model distillation on Amazon Bedrock. They want to use the highest-quality Amazon Nova model as the teacher. Which model should they select?

Nova Pro
Nova Lite
Nova Premier
Nova 2 Omni

Answer: C

Nova Premier is explicitly described as the most capable Amazon Nova model and the best teacher model for distillation workflows. It transfers its knowledge to smaller student models, making it the correct choice for the teacher role.

Q4. A company wants a single Amazon Nova model that can analyze documents with text and images, generate written reports, AND create illustration images. Which Nova model offers ALL these capabilities?

Nova Premier
Nova Pro + Nova Canvas (two models)
Nova 2 Omni
Nova Micro

Answer: C

Nova 2 Omni is the all-in-one model that combines multimodal understanding (text, images, video, documents) WITH image generation capability. It's the only single Nova model that can both analyze multimodal content and generate images.

Q5. An enterprise is building a voice-enabled AI assistant that needs to understand spoken questions and respond with spoken answers in multiple languages. Which Amazon Nova model is designed for this?

Nova Micro
Nova Canvas
Nova Sonic
Nova Pro

Answer: C

Nova Sonic is Amazon Nova's speech model, specifically designed for conversational speech understanding and generation with multilingual support. It handles the voice-to-voice interaction required for a voice-enabled assistant.

Q6. A retail company wants to implement RAG search across their product catalog, which includes both text descriptions and product images. Which Amazon Nova capability best supports this multimodal RAG use case?

Nova Canvas for image search
Nova 2 Multimodal Embeddings
Nova Micro with text embeddings
Nova Reel for video indexing

Answer: B

Nova 2 Multimodal Embeddings can create embeddings for both text and images in the same vector space. This enables RAG search across different content types -- users can search with text and find relevant images, or vice versa.

AWS AI Practitioner - Table of Contents

Master all exam topics with comprehensive study guides and practice questions.

AWS AI Practitioner - Practice Tests Real Time Practice Tests AWS AI Practitioner Preparation Topics Cover all exam domains Introduction to AWS & Cloud Computing AWS basics, cloud models, pricing Amazon Bedrock and Generative AI Foundation Models, Bedrock, GenAI Prompt Engineering Prompt techniques, optimization Amazon Q - Deep Dive Q Business, Q Developer, Q Apps AI & Machine Learning Fundamentals AI/ML hierarchy, training, learning types AWS Managed AI Services Comprehend, Translate, Transcribe, more Amazon SageMaker - Deep Dive End-to-end ML platform deep dive AI Challenges and Responsibilities Responsible AI, bias, governance AWS Security Services IAM, S3, encryption, compliance

Search Tutorials

AWS AI Practitioner - Amazon Bedrock and Generative AI (GenAI)

What is Generative AI (GenAI)?

The AI Hierarchy:

ASCII DIAGRAM: AI/ML/DL/GenAI Hierarchy Pyramid

What Can GenAI Generate?

Foundation Models (FMs):

Who Builds Foundation Models?

Large Language Models (LLMs):

Why is GenAI Output Non-Deterministic?

GenAI for Images -- Diffusion Models:

Types of Machine Learning:

Transformer Architecture:

Key Terms

Practice Questions

Amazon Bedrock - Overview

What is Amazon Bedrock?

ASCII DIAGRAM: Amazon Bedrock Architecture Overview

Key Characteristics:

Foundation Model Providers on Bedrock:

Core Capabilities of Amazon Bedrock:

Bedrock Playground:

Bedrock Supported Use Cases:

Key Terms

Practice Questions

Amazon Bedrock - Foundation Model Selection

Choosing the Right Foundation Model:

Model Comparison (Exam-Relevant Examples):

Amazon Titan -- Key Model Family to Know for the Exam:

General Guidance:

Model Selection Decision Framework:

Open Source vs. Commercial Models:

Key Terms

Practice Questions

Amazon Bedrock - Fine-Tuning a Model

What is Fine-Tuning?

ASCII DIAGRAM: Foundation Model -> Fine-Tuning -> Inference Pipeline

Three Fine-Tuning Techniques on Bedrock:

1. Supervised Fine-Tuning

2. Reinforcement Fine-Tuning

3. Distillation

Supervised vs. Reinforcement Fine-Tuning -- Quick Comparison:

Inference Pricing for Fine-Tuned Models:

Continued Pre-Training:

Fine-Tuning Data Requirements:

Key Terms

Practice Questions

Amazon Bedrock - FM Evaluation

Why Evaluate a Foundation Model?

Two Evaluation Approaches on Bedrock:

1. Automatic Evaluation

2. Human Evaluation

Key Evaluation Metrics (Know These for the Exam):

ROUGE vs. BLEU vs. BERTScore -- The Key Difference:

N-Grams Explained:

Benchmark Datasets:

Business Metrics for Model Evaluation:

Additional Evaluation Metrics:

Key Terms

Practice Questions

RAG and Knowledge Bases

What is RAG (Retrieval-Augmented Generation)?

The Core Problem RAG Solves:

ASCII DIAGRAM: RAG (Retrieval-Augmented Generation) Flow

How RAG Works -- Step by Step:

RAG vs. Fine-Tuning -- Critical Distinction:

Embeddings and Vector Databases:

Vector Database Options on Bedrock:

Data Sources Supported by Bedrock Knowledge Bases:

Key RAG Use Cases:

Chunking Strategies:

GraphRAG (Knowledge Graph RAG):

Key Terms

Practice Questions

More GenAI Concepts - Tokenization, Context Windows, and Embeddings

Tokenization:

ASCII DIAGRAM: Token Flow (Prompt -> Model -> Response)

Why Tokenize?

Tokenization Methods:

Why Tokenization Matters: