Amazon DynamoDB - Comprehensive Guide

🗃️ Tổng quan về DynamoDB

DynamoDB là gì?

NoSQL Database: Managed non-relational database
Serverless: Không cần quản lý infrastructure
Multi-AZ: Built-in high availability
Performance: Single-digit millisecond latency
Scalability: Handles millions of requests per second

Key Features

ACID transactions: Full ACID support
Global Tables: Multi-region replication
Point-in-time recovery: Continuous backups
Encryption: At rest và in transit
Auto-scaling: Capacity scaling theo demand

🏗️ Data Model

Core Concepts

Table
├── Items (Rows)
│   ├── Attributes (Columns)
│   └── Primary Key
│       ├── Partition Key (Required)
│       └── Sort Key (Optional)
└── Indexes
    ├── Global Secondary Index (GSI)
    └── Local Secondary Index (LSI)

Data Types

{
  "String": "Hello World",
  "Number": 123.45,
  "Binary": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk",
  "Boolean": true,
  "Null": null,
  "List": [1, "two", true],
  "Map": {
    "name": "John",
    "age": 30
  },
  "String Set": ["red", "green", "blue"],
  "Number Set": [1, 2, 3],
  "Binary Set": ["U3Vubnk=", "UmFpbnk="]
}

Primary Key Types

# Simple Primary Key (Partition Key only)
User Table:
  Partition Key: UserID
  Example: UserID = "user123"

# Composite Primary Key (Partition + Sort Key)
Order Table:
  Partition Key: UserID
  Sort Key: OrderDate
  Example: UserID = "user123", OrderDate = "2024-01-15"

🔍 Access Patterns

Query Operations

# Query by Partition Key
response = table.query(
    KeyConditionExpression=Key('UserID').eq('user123')
)

# Query với Sort Key range
response = table.query(
    KeyConditionExpression=Key('UserID').eq('user123') & 
                          Key('OrderDate').between('2024-01-01', '2024-01-31')
)

# Query với Filter Expression
response = table.query(
    KeyConditionExpression=Key('UserID').eq('user123'),
    FilterExpression=Attr('Amount').gt(100)
)

Scan Operations

# Scan entire table (expensive!)
response = table.scan()

# Scan với filter
response = table.scan(
    FilterExpression=Attr('Status').eq('Active')
)

# Parallel scan
response = table.scan(
    Segment=0,
    TotalSegments=4
)

Get/Put/Update/Delete

# Get Item
response = table.get_item(
    Key={'UserID': 'user123', 'OrderDate': '2024-01-15'}
)

# Put Item
table.put_item(
    Item={
        'UserID': 'user123',
        'OrderDate': '2024-01-15',
        'Amount': 150.00,
        'Status': 'Completed'
    }
)

# Update Item
table.update_item(
    Key={'UserID': 'user123', 'OrderDate': '2024-01-15'},
    UpdateExpression='SET #status = :status',
    ExpressionAttributeNames={'#status': 'Status'},
    ExpressionAttributeValues={':status': 'Shipped'}
)

# Delete Item
table.delete_item(
    Key={'UserID': 'user123', 'OrderDate': '2024-01-15'}
)

📊 Capacity Modes

On-Demand Mode

Pay-per-request: Không cần capacity planning
Auto-scaling: Tự động handle traffic spikes
Use case: Unpredictable workloads, serverless applications
Pricing: $0.25 per million read requests, $1.25 per million write requests

Provisioned Mode

Pre-allocated capacity: Specify RCU/WCU
Auto-scaling: Optional capacity adjustment
Use case: Predictable workloads, cost optimization
Pricing: $0.09 per RCU/month, $0.47 per WCU/month

Capacity Units

Read Capacity Unit (RCU):
  Strongly Consistent: 1 RCU = 1 read/sec of 4KB item
  Eventually Consistent: 1 RCU = 2 reads/sec of 4KB item
  Transactional: 1 RCU = 1 read/sec of 4KB item (2x cost)

Write Capacity Unit (WCU):
  Standard: 1 WCU = 1 write/sec of 1KB item
  Transactional: 1 WCU = 1 write/sec of 1KB item (2x cost)

🎯 Indexes

Global Secondary Index (GSI)

# Create GSI
table.create_table(
    TableName='Orders',
    KeySchema=[
        {'AttributeName': 'UserID', 'KeyType': 'HASH'},
        {'AttributeName': 'OrderDate', 'KeyType': 'RANGE'}
    ],
    AttributeDefinitions=[
        {'AttributeName': 'UserID', 'AttributeType': 'S'},
        {'AttributeName': 'OrderDate', 'AttributeType': 'S'},
        {'AttributeName': 'Status', 'AttributeType': 'S'}
    ],
    GlobalSecondaryIndexes=[
        {
            'IndexName': 'StatusIndex',
            'KeySchema': [
                {'AttributeName': 'Status', 'KeyType': 'HASH'},
                {'AttributeName': 'OrderDate', 'KeyType': 'RANGE'}
            ],
            'Projection': {'ProjectionType': 'ALL'}
        }
    ]
)

# Query GSI
response = table.query(
    IndexName='StatusIndex',
    KeyConditionExpression=Key('Status').eq('Pending')
)

Local Secondary Index (LSI)

# LSI phải tạo cùng lúc với table
# Same partition key, different sort key
table.create_table(
    TableName='GameScores',
    KeySchema=[
        {'AttributeName': 'UserID', 'KeyType': 'HASH'},
        {'AttributeName': 'GameTitle', 'KeyType': 'RANGE'}
    ],
    LocalSecondaryIndexes=[
        {
            'IndexName': 'TopScoreIndex',
            'KeySchema': [
                {'AttributeName': 'UserID', 'KeyType': 'HASH'},
                {'AttributeName': 'TopScore', 'KeyType': 'RANGE'}
            ],
            'Projection': {'ProjectionType': 'ALL'}
        }
    ]
)

GSI vs LSI Comparison

Feature	GSI	LSI
Creation	Anytime	Only at table creation
Partition Key	Different	Same as table
Sort Key	Different	Different
Size Limit	No limit	10GB per partition
Consistency	Eventually consistent	Strong/eventual
Capacity	Separate	Shares with table

🔄 DynamoDB Streams

What are Streams?

Change data capture: Record data modifications
Real-time: Near real-time processing
Ordered: Changes in order per item
Retention: 24 hours maximum

Stream Types

Stream View Types:
  KEYS_ONLY: Only key attributes
  NEW_IMAGE: Entire item after modification
  OLD_IMAGE: Entire item before modification
  NEW_AND_OLD_IMAGES: Both before and after

Stream Processing

# Lambda function processing DynamoDB Stream
import json

def lambda_handler(event, context):
    for record in event['Records']:
        event_name = record['eventName']  # INSERT, MODIFY, REMOVE

        if event_name == 'INSERT':
            new_image = record['dynamodb']['NewImage']
            # Process new item

        elif event_name == 'MODIFY':
            old_image = record['dynamodb']['OldImage']
            new_image = record['dynamodb']['NewImage']
            # Process updated item

        elif event_name == 'REMOVE':
            old_image = record['dynamodb']['OldImage']
            # Process deleted item

🔒 Security Features

Encryption

Encryption at Rest:
  Default: AWS owned keys
  KMS: Customer managed keys
  Options: Table-level, index-level

Encryption in Transit:
  TLS: All API calls
  VPC Endpoints: Private connectivity

Access Control

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem"
      ],
      "Resource": "arn:aws:dynamodb:region:account:table/MyTable",
      "Condition": {
        "ForAllValues:StringEquals": {
          "dynamodb:LeadingKeys": ["${aws:userid}"]
        }
      }
    }
  ]
}

Fine-grained Access Control

# Item-level access với Leading Keys
# User chỉ có thể access items với UserID = aws:userid
table.get_item(
    Key={'UserID': 'current-user-id', 'ItemID': 'item123'}
)

📈 Performance Optimization

Hot Partitions

Problem: Uneven data distribution
Causes:
  - Poor partition key choice
  - Time-based access patterns
  - Celebrity/viral content

Solutions:
  - Better partition key design
  - Write sharding
  - Random suffixes

Efficient Access Patterns

# ❌ Poor: Sequential access
partition_key = f"user-{timestamp}"

# ✅ Better: Even distribution
partition_key = f"user-{hash(user_id) % 1000}-{user_id}"

# ❌ Poor: Scan operation
response = table.scan(
    FilterExpression=Attr('Category').eq('Electronics')
)

# ✅ Better: GSI query
response = table.query(
    IndexName='CategoryIndex',
    KeyConditionExpression=Key('Category').eq('Electronics')
)

Batch Operations

# Batch Get (up to 100 items)
response = dynamodb.batch_get_item(
    RequestItems={
        'MyTable': {
            'Keys': [
                {'UserID': 'user1', 'OrderID': 'order1'},
                {'UserID': 'user2', 'OrderID': 'order2'}
            ]
        }
    }
)

# Batch Write (up to 25 items)
with table.batch_writer() as batch:
    for i in range(100):
        batch.put_item(Item={'UserID': f'user{i}', 'Data': f'data{i}'})

💰 Cost Optimization

Cost Factors

Storage:
  Standard: $0.25 per GB/month
  IA (Infrequent Access): $0.10 per GB/month

Requests:
  On-Demand: $0.25 per million reads, $1.25 per million writes
  Provisioned: $0.09 per RCU/month, $0.47 per WCU/month

Data Transfer:
  Same Region: Free
  Cross-Region: $0.02 per GB
  Internet: $0.09 per GB

Cost Optimization Strategies

Table design: Minimize item size
Access patterns: Use queries instead of scans
Capacity mode: Choose based on workload
TTL: Auto-delete expired items
Storage class: Use IA for infrequent access

🔄 Global Tables

Multi-region Replication

Global Table Setup:
  Primary Region: us-east-1
  Replica Regions: 
    - us-west-2
    - eu-west-1
    - ap-south-1

Replication:
  Type: Asynchronous
  Consistency: Eventually consistent
  Conflict Resolution: Last writer wins

Configuration

# Enable Global Tables
dynamodb.create_global_table(
    GlobalTableName='MyGlobalTable',
    ReplicationGroup=[
        {'RegionName': 'us-east-1'},
        {'RegionName': 'us-west-2'},
        {'RegionName': 'eu-west-1'}
    ]
)

🛠️ DynamoDB Tools

AWS CLI

# Create table
aws dynamodb create-table \
    --table-name MyTable \
    --attribute-definitions AttributeName=ID,AttributeType=S \
    --key-schema AttributeName=ID,KeyType=HASH \
    --billing-mode PAY_PER_REQUEST

# Query table
aws dynamodb query \
    --table-name MyTable \
    --key-condition-expression "ID = :id" \
    --expression-attribute-values '{":id":{"S":"user123"}}'

# Import from S3
aws dynamodb import-table \
    --s3-bucket-source Bucket=my-bucket,KeyPrefix=data/ \
    --input-format DYNAMODB_JSON \
    --table-creation-parameters TableName=ImportedTable

NoSQL Workbench

Visual design: Table modeling tool
Data modeling: Design và visualize tables
Sample data: Generate test data
Code generation: Generate application code

🧪 Best Practices

Table Design

Single table design: One table per application
Composite keys: Enable hierarchical data
Sparse indexes: GSI with selective attributes
Overloaded GSI: Multiple access patterns
Adjacent items: Store related data together

Performance

Partition key distribution: Avoid hot partitions
Item size: Keep under 400KB
Burst capacity: Handle traffic spikes
Connection pooling: Reuse connections
Compression: Compress large attributes

Cost Management

Monitor usage: Use CloudWatch metrics
Auto-scaling: Set up capacity scaling
Reserved capacity: For predictable workloads
TTL: Auto-expire old data
IA storage: For infrequent access

📝 Exam Tips cho AWS SAA

Key Scenarios

NoSQL requirements: Flexible schema, horizontal scaling
Real-time applications: Gaming, IoT, mobile apps
Serverless architecture: Lambda + DynamoDB
High-performance: Sub-10ms latency requirements

Common Patterns

E-commerce:
  Product Catalog: GSI on Category, Brand
  User Orders: Partition by UserID, Sort by OrderDate
  Shopping Cart: TTL for abandoned carts

Gaming:
  Player Profiles: Partition by PlayerID
  Leaderboards: GSI sorted by Score
  Game Sessions: TTL for expired sessions

IoT:
  Device Data: Partition by DeviceID, Sort by Timestamp
  Aggregated Metrics: Time-based partitioning
  Alerts: GSI on Severity

📖 Tóm tắt

DynamoDB là fully managed NoSQL database service cung cấp: - High performance với single-digit millisecond latency - Flexible scaling từ zero đến millions of requests - Multiple consistency models và global distribution - Comprehensive security và compliance features - Cost-effective pricing với on-demand và provisioned options

Hiểu rõ data modeling, access patterns, và performance optimization là essential cho AWS SAA exam.