Top 20 AWS Kinesis Interview Questions
- What is Amazon Kinesis?
- What are Kinesis Data Streams?
- What is a Kinesis shard?
- What is the Kinesis Client Library (KCL)?
- What is Kinesis Data Firehose?
- What is Kinesis Data Analytics?
- How do you produce data to Kinesis?
- How do you consume data from Kinesis?
- What is enhanced fan-out?
- How do you handle failures in Kinesis?
- What is data retention in Kinesis?
- How do you scale Kinesis streams?
- What are partition keys?
- What is Kinesis Video Streams?
- How do you transform data in Firehose?
- What are Kinesis Analytics windowing functions?
- How do you integrate Kinesis with Lambda?
- What are Kinesis security best practices?
- How do you monitor Kinesis?
- What are common Kinesis patterns?
AWS Interview Questions - All Topics
1. What is Amazon Kinesis?
Amazon Kinesis is a platform for collecting, processing, and analyzing real-time streaming data at scale.
Kinesis Services:
âââ Kinesis Data Streams: Real-time data streaming
âââ Kinesis Data Firehose: Load streaming data to destinations
âââ Kinesis Data Analytics: SQL/Flink for stream processing
âââ Kinesis Video Streams: Video streaming
Use Cases:
âââ Real-time analytics
âââ Log and event data collection
âââ IoT data ingestion
âââ Clickstream analysis
âââ Social media feeds
âââ Gaming data processing
Architecture:
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â Producers â
â (Applications, IoT, Logs, Events) â
ââââââââââââââââââââââââ¬âââââââââââââââââââââââââââââââ
â
ââââââââââââââââââââââââ¼âââââââââââââââââââââââââââââââ
â Kinesis Data Streams â
â ââââââââââ ââââââââââ ââââââââââ ââââââââââ â
â âShard 1 â âShard 2 â âShard 3 â âShard N â â
â ââââââââââ ââââââââââ ââââââââââ ââââââââââ â
ââââââââââââââââââââââââ¬âââââââââââââââââââââââââââââââ
â
ââââââââââââââââââââââââ¼âââââââââââââââââââââââââââââââ
â Consumers â
â (Lambda, KCL Apps, Analytics, Firehose) â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
2. What are Kinesis Data Streams?
Kinesis Data Streams Features:
âââ Real-time data streaming
âââ Configurable retention (1-365 days)
âââ Multiple consumers per stream
âââ Replay capability
âââ Ordering within shard
âââ Encryption at rest
# Create Data Stream
import boto3
kinesis = boto3.client('kinesis')
kinesis.create_stream(
StreamName='my-stream',
ShardCount=4,
StreamModeDetails={
'StreamMode': 'PROVISIONED' # or 'ON_DEMAND'
}
)
# On-Demand mode
kinesis.create_stream(
StreamName='my-stream-on-demand',
StreamModeDetails={
'StreamMode': 'ON_DEMAND'
}
)
Data Stream Modes:
âââ Provisioned: Specify shard count
â âââ 1 MB/sec write per shard
â âââ 2 MB/sec read per shard
â âââ 1,000 records/sec write per shard
â
âââ On-Demand: Auto-scales
âââ Up to 200 MB/sec write
âââ Up to 400 MB/sec read
âââ Pay per GB ingested/retrieved
3. What is a Kinesis shard?
A shard is the base throughput unit of a Kinesis data stream.
Shard Characteristics:
âââ 1 MB/sec input (1,000 records/sec)
âââ 2 MB/sec output (5 reads/sec)
âââ Data ordered within shard
âââ 24-hour default retention
âââ Unique sequence number per record
Shard Structure:
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â Stream â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â Shard 1 â
â Hash Range: 0 - 85070591730234615865843651857942052863
â ââââââ¬âââââ¬âââââ¬âââââ¬âââââ¬âââââ â
â â R1 â R2 â R3 â R4 â R5 â... â (Ordered) â
â ââââââ´âââââ´âââââ´âââââ´âââââ´âââââ â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â Shard 2 â
â Hash Range: 85070591730234615865843651857942052864 - ...
â ââââââ¬âââââ¬âââââ¬âââââ¬âââââ¬âââââ â
â â R1 â R2 â R3 â R4 â R5 â... â (Ordered) â
â ââââââ´âââââ´âââââ´âââââ´âââââ´âââââ â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# Describe stream (get shard info)
response = kinesis.describe_stream(StreamName='my-stream')
for shard in response['StreamDescription']['Shards']:
print(f"Shard ID: {shard['ShardId']}")
print(f"Hash Key Range: {shard['HashKeyRange']}")
print(f"Sequence Number Range: {shard['SequenceNumberRange']}")
4. What is the Kinesis Client Library (KCL)?
KCL simplifies building consumer applications with automatic load balancing and checkpointing.
KCL Features:
âââ Automatic shard assignment
âââ Load balancing across workers
âââ Checkpointing (DynamoDB)
âââ Failure handling
âââ Lease management
âââ Enhanced fan-out support
KCL Architecture:
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â KCL Application â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â Worker 1 Worker 2 Worker 3 â
â âââââââââââ âââââââââââ âââââââââââ â
â âShard 1 â âShard 2 â âShard 3 â â
â âProcessorâ âProcessorâ âProcessorâ â
â âââââââââââ âââââââââââ âââââââââââ â
â â â â â
â ââââââââââââââââââ¼âââââââââââââââââ â
â â â
â âââââââââââ¼ââââââââââ â
â â DynamoDB â â
â â (Checkpoints) â â
â âââââââââââââââââââââ â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# KCL 2.x Python Implementation
from amazon_kinesis_utils import kcl
class RecordProcessor:
def initialize(self, initialization_input):
self.shard_id = initialization_input.shard_id
def process_records(self, process_records_input):
for record in process_records_input.records:
data = record.data.decode('utf-8')
# Process record
print(f"Processing: {data}")
# Checkpoint after processing
process_records_input.checkpointer.checkpoint()
def shutdown(self, shutdown_input):
if shutdown_input.reason == 'TERMINATE':
shutdown_input.checkpointer.checkpoint()
# Run with MultiLangDaemon
# java -cp kcl-2.x.jar software.amazon.kinesis.multilang.MultiLangDaemon --properties-file kcl.properties
5. What is Kinesis Data Firehose?
Kinesis Data Firehose is a fully managed service to load streaming data into data stores.
Firehose Destinations:
âââ Amazon S3
âââ Amazon Redshift (via S3)
âââ Amazon OpenSearch
âââ Splunk
âââ HTTP Endpoints
âââ Third-party services (Datadog, MongoDB, etc.)
Features:
âââ Automatic scaling
âââ Data transformation (Lambda)
âââ Format conversion (Parquet, ORC)
âââ Data compression (GZIP, Snappy)
âââ Encryption
âââ Backup to S3
# Create Firehose Delivery Stream
firehose = boto3.client('firehose')
firehose.create_delivery_stream(
DeliveryStreamName='my-firehose',
DeliveryStreamType='DirectPut', # or 'KinesisStreamAsSource'
S3DestinationConfiguration={
'RoleARN': 'arn:aws:iam::123456789012:role/FirehoseRole',
'BucketARN': 'arn:aws:s3:::my-bucket',
'Prefix': 'data/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/',
'ErrorOutputPrefix': 'errors/',
'BufferingHints': {
'SizeInMBs': 128,
'IntervalInSeconds': 300
},
'CompressionFormat': 'GZIP',
'EncryptionConfiguration': {
'KMSEncryptionConfig': {
'AWSKMSKeyARN': 'arn:aws:kms:...'
}
}
}
)
# Put record to Firehose
firehose.put_record(
DeliveryStreamName='my-firehose',
Record={'Data': json.dumps({'event': 'click', 'timestamp': time.time()})}
)