Top 20 AWS S3 and Lake Formation Interview Questions
- What is Amazon S3?
- What are S3 storage classes?
- What is S3 versioning and lifecycle policies?
- How do you secure S3 buckets?
- What is AWS Lake Formation?
- How do you set up a data lake with Lake Formation?
- What is the Lake Formation permission model?
- What is tag-based access control (TBAC)?
- How do you implement data sharing in Lake Formation?
- What are S3 performance optimization techniques?
- What is S3 Select and Glacier Select?
- How do you configure S3 event notifications?
- What is S3 Replication?
- What are S3 access points?
- How do you implement data governance with Lake Formation?
- What is the Glue Data Catalog integration?
- How do you handle data quality in Lake Formation?
- What are governed tables?
- How do you monitor S3 and Lake Formation?
- What are best practices for S3 data lakes?
AWS Interview Questions - All Topics
1. What is Amazon S3?
Amazon S3 (Simple Storage Service) is an object storage service offering scalability, data availability, security, and performance.
S3 Key Concepts:
âââ Bucket: Container for objects
âââ Object: File + metadata
âââ Key: Unique identifier (path)
âââ Region: Physical location
âââ Version ID: Object version
S3 Features:
âââ 11 9s durability (99.999999999%)
âââ 99.99% availability (Standard)
âââ Unlimited storage
âââ Objects up to 5TB
âââ Multipart upload (>100MB)
âââ Server-side encryption
# Upload object
import boto3
s3 = boto3.client('s3')
s3.upload_file('local_file.csv', 'my-bucket', 'data/file.csv')
# Download object
s3.download_file('my-bucket', 'data/file.csv', 'local_file.csv')
# List objects
paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket='my-bucket', Prefix='data/'):
for obj in page.get('Contents', []):
print(obj['Key'])
2. What are S3 storage classes?
| Class | Use Case | Retrieval | Cost |
|---|---|---|---|
| Standard | Frequent access | Immediate | $$$$ |
| Intelligent-Tiering | Unknown patterns | Immediate | $$$ |
| Standard-IA | Infrequent access | Immediate | $$ |
| One Zone-IA | Infrequent, single AZ | Immediate | $ |
| Glacier Instant | Archive, fast access | Milliseconds | $ |
| Glacier Flexible | Archive | Minutes-hours | $ |
| Glacier Deep Archive | Long-term archive | 12-48 hours | $ |
# Upload with storage class
s3.put_object(
Bucket='my-bucket',
Key='archive/data.csv',
Body=data,
StorageClass='GLACIER'
)
# Intelligent-Tiering archive config
s3.put_bucket_intelligent_tiering_configuration(
Bucket='my-bucket',
Id='archive-config',
IntelligentTieringConfiguration={
'Id': 'archive-config',
'Status': 'Enabled',
'Tierings': [
{'Days': 90, 'AccessTier': 'ARCHIVE_ACCESS'},
{'Days': 180, 'AccessTier': 'DEEP_ARCHIVE_ACCESS'}
]
}
)
3. What is S3 versioning and lifecycle policies?
# Enable versioning
s3.put_bucket_versioning(
Bucket='my-bucket',
VersioningConfiguration={'Status': 'Enabled'}
)
# List versions
versions = s3.list_object_versions(Bucket='my-bucket', Prefix='data/')
for version in versions.get('Versions', []):
print(f"{version['Key']} - {version['VersionId']}")
# Delete specific version
s3.delete_object(Bucket='my-bucket', Key='data/file.csv', VersionId='xxx')
# Lifecycle Policy
lifecycle_policy = {
'Rules': [
{
'ID': 'TransitionToIA',
'Status': 'Enabled',
'Filter': {'Prefix': 'logs/'},
'Transitions': [
{'Days': 30, 'StorageClass': 'STANDARD_IA'},
{'Days': 90, 'StorageClass': 'GLACIER'}
],
'Expiration': {'Days': 365}
},
{
'ID': 'DeleteOldVersions',
'Status': 'Enabled',
'Filter': {'Prefix': ''},
'NoncurrentVersionTransitions': [
{'NoncurrentDays': 30, 'StorageClass': 'GLACIER'}
],
'NoncurrentVersionExpiration': {'NoncurrentDays': 90}
},
{
'ID': 'AbortIncompleteUploads',
'Status': 'Enabled',
'Filter': {'Prefix': ''},
'AbortIncompleteMultipartUpload': {'DaysAfterInitiation': 7}
}
]
}
s3.put_bucket_lifecycle_configuration(
Bucket='my-bucket',
LifecycleConfiguration=lifecycle_policy
)
4. How do you secure S3 buckets?
S3 Security Layers:
1. Block Public Access (Account/Bucket level)
s3.put_public_access_block(
Bucket='my-bucket',
PublicAccessBlockConfiguration={
'BlockPublicAcls': True,
'IgnorePublicAcls': True,
'BlockPublicPolicy': True,
'RestrictPublicBuckets': True
}
)
2. Bucket Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyUnencrypted",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms"
}
}
},
{
"Sid": "EnforceSSL",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"Bool": {"aws:SecureTransport": "false"}
}
}
]
}
3. Encryption
# Default encryption
s3.put_bucket_encryption(
Bucket='my-bucket',
ServerSideEncryptionConfiguration={
'Rules': [{
'ApplyServerSideEncryptionByDefault': {
'SSEAlgorithm': 'aws:kms',
'KMSMasterKeyID': 'arn:aws:kms:...'
},
'BucketKeyEnabled': True # Reduces KMS costs
}]
}
)
4. Access Logging
s3.put_bucket_logging(
Bucket='my-bucket',
BucketLoggingStatus={
'LoggingEnabled': {
'TargetBucket': 'logs-bucket',
'TargetPrefix': 's3-access-logs/'
}
}
)
5. What is AWS Lake Formation?
AWS Lake Formation simplifies building, securing, and managing data lakes with centralized governance.
Lake Formation Components:
âââ Data Catalog (Glue Catalog)
âââ Blueprints (automated ingestion)
âââ Security (fine-grained access)
âââ Data Sharing (cross-account)
âââ Governed Tables (ACID)
Benefits:
âââ Centralized security management
âââ Fine-grained access control
âââ Column and row-level security
âââ Data sharing without copying
âââ Integration with Glue, Athena, Redshift
# Register S3 location
import boto3
lf = boto3.client('lakeformation')
lf.register_resource(
ResourceArn='arn:aws:s3:::my-data-lake',
UseServiceLinkedRole=True,
HybridAccessEnabled=False
)
# Grant data location permission
lf.grant_permissions(
Principal={'DataLakePrincipalIdentifier': 'arn:aws:iam::123456789012:role/GlueRole'},
Resource={'DataLocation': {'ResourceArn': 'arn:aws:s3:::my-data-lake'}},
Permissions=['DATA_LOCATION_ACCESS']
)