Amazon DynamoDB - Comprehensive Guide
🗃️ Tổng quan về DynamoDB
DynamoDB là gì?
- NoSQL Database: Managed non-relational database
- Serverless: Không cần quản lý infrastructure
- Multi-AZ: Built-in high availability
- Performance: Single-digit millisecond latency
- Scalability: Handles millions of requests per second
Key Features
- ACID transactions: Full ACID support
- Global Tables: Multi-region replication
- Point-in-time recovery: Continuous backups
- Encryption: At rest và in transit
- Auto-scaling: Capacity scaling theo demand
🏗️ Data Model
Core Concepts
Table
├── Items (Rows)
│ ├── Attributes (Columns)
│ └── Primary Key
│ ├── Partition Key (Required)
│ └── Sort Key (Optional)
└── Indexes
├── Global Secondary Index (GSI)
└── Local Secondary Index (LSI)
Data Types
{
"String": "Hello World",
"Number": 123.45,
"Binary": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk",
"Boolean": true,
"Null": null,
"List": [1, "two", true],
"Map": {
"name": "John",
"age": 30
},
"String Set": ["red", "green", "blue"],
"Number Set": [1, 2, 3],
"Binary Set": ["U3Vubnk=", "UmFpbnk="]
}
Primary Key Types
# Simple Primary Key (Partition Key only)
User Table:
Partition Key: UserID
Example: UserID = "user123"
# Composite Primary Key (Partition + Sort Key)
Order Table:
Partition Key: UserID
Sort Key: OrderDate
Example: UserID = "user123", OrderDate = "2024-01-15"
🔍 Access Patterns
Query Operations
# Query by Partition Key
response = table.query(
KeyConditionExpression=Key('UserID').eq('user123')
)
# Query với Sort Key range
response = table.query(
KeyConditionExpression=Key('UserID').eq('user123') &
Key('OrderDate').between('2024-01-01', '2024-01-31')
)
# Query với Filter Expression
response = table.query(
KeyConditionExpression=Key('UserID').eq('user123'),
FilterExpression=Attr('Amount').gt(100)
)
Scan Operations
# Scan entire table (expensive!)
response = table.scan()
# Scan với filter
response = table.scan(
FilterExpression=Attr('Status').eq('Active')
)
# Parallel scan
response = table.scan(
Segment=0,
TotalSegments=4
)
Get/Put/Update/Delete
# Get Item
response = table.get_item(
Key={'UserID': 'user123', 'OrderDate': '2024-01-15'}
)
# Put Item
table.put_item(
Item={
'UserID': 'user123',
'OrderDate': '2024-01-15',
'Amount': 150.00,
'Status': 'Completed'
}
)
# Update Item
table.update_item(
Key={'UserID': 'user123', 'OrderDate': '2024-01-15'},
UpdateExpression='SET #status = :status',
ExpressionAttributeNames={'#status': 'Status'},
ExpressionAttributeValues={':status': 'Shipped'}
)
# Delete Item
table.delete_item(
Key={'UserID': 'user123', 'OrderDate': '2024-01-15'}
)
📊 Capacity Modes
On-Demand Mode
- Pay-per-request: Không cần capacity planning
- Auto-scaling: Tự động handle traffic spikes
- Use case: Unpredictable workloads, serverless applications
- Pricing: $0.25 per million read requests, $1.25 per million write requests
Provisioned Mode
- Pre-allocated capacity: Specify RCU/WCU
- Auto-scaling: Optional capacity adjustment
- Use case: Predictable workloads, cost optimization
- Pricing: $0.09 per RCU/month, $0.47 per WCU/month
Capacity Units
Read Capacity Unit (RCU):
Strongly Consistent: 1 RCU = 1 read/sec of 4KB item
Eventually Consistent: 1 RCU = 2 reads/sec of 4KB item
Transactional: 1 RCU = 1 read/sec of 4KB item (2x cost)
Write Capacity Unit (WCU):
Standard: 1 WCU = 1 write/sec of 1KB item
Transactional: 1 WCU = 1 write/sec of 1KB item (2x cost)
🎯 Indexes
Global Secondary Index (GSI)
# Create GSI
table.create_table(
TableName='Orders',
KeySchema=[
{'AttributeName': 'UserID', 'KeyType': 'HASH'},
{'AttributeName': 'OrderDate', 'KeyType': 'RANGE'}
],
AttributeDefinitions=[
{'AttributeName': 'UserID', 'AttributeType': 'S'},
{'AttributeName': 'OrderDate', 'AttributeType': 'S'},
{'AttributeName': 'Status', 'AttributeType': 'S'}
],
GlobalSecondaryIndexes=[
{
'IndexName': 'StatusIndex',
'KeySchema': [
{'AttributeName': 'Status', 'KeyType': 'HASH'},
{'AttributeName': 'OrderDate', 'KeyType': 'RANGE'}
],
'Projection': {'ProjectionType': 'ALL'}
}
]
)
# Query GSI
response = table.query(
IndexName='StatusIndex',
KeyConditionExpression=Key('Status').eq('Pending')
)
Local Secondary Index (LSI)
# LSI phải tạo cùng lúc với table
# Same partition key, different sort key
table.create_table(
TableName='GameScores',
KeySchema=[
{'AttributeName': 'UserID', 'KeyType': 'HASH'},
{'AttributeName': 'GameTitle', 'KeyType': 'RANGE'}
],
LocalSecondaryIndexes=[
{
'IndexName': 'TopScoreIndex',
'KeySchema': [
{'AttributeName': 'UserID', 'KeyType': 'HASH'},
{'AttributeName': 'TopScore', 'KeyType': 'RANGE'}
],
'Projection': {'ProjectionType': 'ALL'}
}
]
)
GSI vs LSI Comparison
| Feature | GSI | LSI |
|---|---|---|
| Creation | Anytime | Only at table creation |
| Partition Key | Different | Same as table |
| Sort Key | Different | Different |
| Size Limit | No limit | 10GB per partition |
| Consistency | Eventually consistent | Strong/eventual |
| Capacity | Separate | Shares with table |
🔄 DynamoDB Streams
What are Streams?
- Change data capture: Record data modifications
- Real-time: Near real-time processing
- Ordered: Changes in order per item
- Retention: 24 hours maximum
Stream Types
Stream View Types:
KEYS_ONLY: Only key attributes
NEW_IMAGE: Entire item after modification
OLD_IMAGE: Entire item before modification
NEW_AND_OLD_IMAGES: Both before and after
Stream Processing
# Lambda function processing DynamoDB Stream
import json
def lambda_handler(event, context):
for record in event['Records']:
event_name = record['eventName'] # INSERT, MODIFY, REMOVE
if event_name == 'INSERT':
new_image = record['dynamodb']['NewImage']
# Process new item
elif event_name == 'MODIFY':
old_image = record['dynamodb']['OldImage']
new_image = record['dynamodb']['NewImage']
# Process updated item
elif event_name == 'REMOVE':
old_image = record['dynamodb']['OldImage']
# Process deleted item
🔒 Security Features
Encryption
Encryption at Rest:
Default: AWS owned keys
KMS: Customer managed keys
Options: Table-level, index-level
Encryption in Transit:
TLS: All API calls
VPC Endpoints: Private connectivity
Access Control
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem"
],
"Resource": "arn:aws:dynamodb:region:account:table/MyTable",
"Condition": {
"ForAllValues:StringEquals": {
"dynamodb:LeadingKeys": ["${aws:userid}"]
}
}
}
]
}
Fine-grained Access Control
# Item-level access với Leading Keys
# User chỉ có thể access items với UserID = aws:userid
table.get_item(
Key={'UserID': 'current-user-id', 'ItemID': 'item123'}
)
📈 Performance Optimization
Hot Partitions
Problem: Uneven data distribution
Causes:
- Poor partition key choice
- Time-based access patterns
- Celebrity/viral content
Solutions:
- Better partition key design
- Write sharding
- Random suffixes
Efficient Access Patterns
# ❌ Poor: Sequential access
partition_key = f"user-{timestamp}"
# ✅ Better: Even distribution
partition_key = f"user-{hash(user_id) % 1000}-{user_id}"
# ❌ Poor: Scan operation
response = table.scan(
FilterExpression=Attr('Category').eq('Electronics')
)
# ✅ Better: GSI query
response = table.query(
IndexName='CategoryIndex',
KeyConditionExpression=Key('Category').eq('Electronics')
)
Batch Operations
# Batch Get (up to 100 items)
response = dynamodb.batch_get_item(
RequestItems={
'MyTable': {
'Keys': [
{'UserID': 'user1', 'OrderID': 'order1'},
{'UserID': 'user2', 'OrderID': 'order2'}
]
}
}
)
# Batch Write (up to 25 items)
with table.batch_writer() as batch:
for i in range(100):
batch.put_item(Item={'UserID': f'user{i}', 'Data': f'data{i}'})
💰 Cost Optimization
Cost Factors
Storage:
Standard: $0.25 per GB/month
IA (Infrequent Access): $0.10 per GB/month
Requests:
On-Demand: $0.25 per million reads, $1.25 per million writes
Provisioned: $0.09 per RCU/month, $0.47 per WCU/month
Data Transfer:
Same Region: Free
Cross-Region: $0.02 per GB
Internet: $0.09 per GB
Cost Optimization Strategies
- Table design: Minimize item size
- Access patterns: Use queries instead of scans
- Capacity mode: Choose based on workload
- TTL: Auto-delete expired items
- Storage class: Use IA for infrequent access
🔄 Global Tables
Multi-region Replication
Global Table Setup:
Primary Region: us-east-1
Replica Regions:
- us-west-2
- eu-west-1
- ap-south-1
Replication:
Type: Asynchronous
Consistency: Eventually consistent
Conflict Resolution: Last writer wins
Configuration
# Enable Global Tables
dynamodb.create_global_table(
GlobalTableName='MyGlobalTable',
ReplicationGroup=[
{'RegionName': 'us-east-1'},
{'RegionName': 'us-west-2'},
{'RegionName': 'eu-west-1'}
]
)
🛠️ DynamoDB Tools
AWS CLI
# Create table
aws dynamodb create-table \
--table-name MyTable \
--attribute-definitions AttributeName=ID,AttributeType=S \
--key-schema AttributeName=ID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
# Query table
aws dynamodb query \
--table-name MyTable \
--key-condition-expression "ID = :id" \
--expression-attribute-values '{":id":{"S":"user123"}}'
# Import from S3
aws dynamodb import-table \
--s3-bucket-source Bucket=my-bucket,KeyPrefix=data/ \
--input-format DYNAMODB_JSON \
--table-creation-parameters TableName=ImportedTable
NoSQL Workbench
- Visual design: Table modeling tool
- Data modeling: Design và visualize tables
- Sample data: Generate test data
- Code generation: Generate application code
🧪 Best Practices
Table Design
- Single table design: One table per application
- Composite keys: Enable hierarchical data
- Sparse indexes: GSI with selective attributes
- Overloaded GSI: Multiple access patterns
- Adjacent items: Store related data together
Performance
- Partition key distribution: Avoid hot partitions
- Item size: Keep under 400KB
- Burst capacity: Handle traffic spikes
- Connection pooling: Reuse connections
- Compression: Compress large attributes
Cost Management
- Monitor usage: Use CloudWatch metrics
- Auto-scaling: Set up capacity scaling
- Reserved capacity: For predictable workloads
- TTL: Auto-expire old data
- IA storage: For infrequent access
📝 Exam Tips cho AWS SAA
Key Scenarios
- NoSQL requirements: Flexible schema, horizontal scaling
- Real-time applications: Gaming, IoT, mobile apps
- Serverless architecture: Lambda + DynamoDB
- High-performance: Sub-10ms latency requirements
Common Patterns
E-commerce:
Product Catalog: GSI on Category, Brand
User Orders: Partition by UserID, Sort by OrderDate
Shopping Cart: TTL for abandoned carts
Gaming:
Player Profiles: Partition by PlayerID
Leaderboards: GSI sorted by Score
Game Sessions: TTL for expired sessions
IoT:
Device Data: Partition by DeviceID, Sort by Timestamp
Aggregated Metrics: Time-based partitioning
Alerts: GSI on Severity
📖 Tóm tắt
DynamoDB là fully managed NoSQL database service cung cấp: - High performance với single-digit millisecond latency - Flexible scaling từ zero đến millions of requests - Multiple consistency models và global distribution - Comprehensive security và compliance features - Cost-effective pricing với on-demand và provisioned options
Hiểu rõ data modeling, access patterns, và performance optimization là essential cho AWS SAA exam.