Amazon DynamoDB - Comprehensive Guide

🗃️ Tổng quan về DynamoDB

DynamoDB là gì?

  • NoSQL Database: Managed non-relational database
  • Serverless: Không cần quản lý infrastructure
  • Multi-AZ: Built-in high availability
  • Performance: Single-digit millisecond latency
  • Scalability: Handles millions of requests per second

Key Features

  • ACID transactions: Full ACID support
  • Global Tables: Multi-region replication
  • Point-in-time recovery: Continuous backups
  • Encryption: At rest và in transit
  • Auto-scaling: Capacity scaling theo demand

🏗️ Data Model

Core Concepts

Table
├── Items (Rows)
│   ├── Attributes (Columns)
│   └── Primary Key
│       ├── Partition Key (Required)
│       └── Sort Key (Optional)
└── Indexes
    ├── Global Secondary Index (GSI)
    └── Local Secondary Index (LSI)

Data Types

{
  "String": "Hello World",
  "Number": 123.45,
  "Binary": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk",
  "Boolean": true,
  "Null": null,
  "List": [1, "two", true],
  "Map": {
    "name": "John",
    "age": 30
  },
  "String Set": ["red", "green", "blue"],
  "Number Set": [1, 2, 3],
  "Binary Set": ["U3Vubnk=", "UmFpbnk="]
}

Primary Key Types

# Simple Primary Key (Partition Key only)
User Table:
  Partition Key: UserID
  Example: UserID = "user123"

# Composite Primary Key (Partition + Sort Key)
Order Table:
  Partition Key: UserID
  Sort Key: OrderDate
  Example: UserID = "user123", OrderDate = "2024-01-15"

🔍 Access Patterns

Query Operations

# Query by Partition Key
response = table.query(
    KeyConditionExpression=Key('UserID').eq('user123')
)

# Query với Sort Key range
response = table.query(
    KeyConditionExpression=Key('UserID').eq('user123') & 
                          Key('OrderDate').between('2024-01-01', '2024-01-31')
)

# Query với Filter Expression
response = table.query(
    KeyConditionExpression=Key('UserID').eq('user123'),
    FilterExpression=Attr('Amount').gt(100)
)

Scan Operations

# Scan entire table (expensive!)
response = table.scan()

# Scan với filter
response = table.scan(
    FilterExpression=Attr('Status').eq('Active')
)

# Parallel scan
response = table.scan(
    Segment=0,
    TotalSegments=4
)

Get/Put/Update/Delete

# Get Item
response = table.get_item(
    Key={'UserID': 'user123', 'OrderDate': '2024-01-15'}
)

# Put Item
table.put_item(
    Item={
        'UserID': 'user123',
        'OrderDate': '2024-01-15',
        'Amount': 150.00,
        'Status': 'Completed'
    }
)

# Update Item
table.update_item(
    Key={'UserID': 'user123', 'OrderDate': '2024-01-15'},
    UpdateExpression='SET #status = :status',
    ExpressionAttributeNames={'#status': 'Status'},
    ExpressionAttributeValues={':status': 'Shipped'}
)

# Delete Item
table.delete_item(
    Key={'UserID': 'user123', 'OrderDate': '2024-01-15'}
)

📊 Capacity Modes

On-Demand Mode

  • Pay-per-request: Không cần capacity planning
  • Auto-scaling: Tự động handle traffic spikes
  • Use case: Unpredictable workloads, serverless applications
  • Pricing: $0.25 per million read requests, $1.25 per million write requests

Provisioned Mode

  • Pre-allocated capacity: Specify RCU/WCU
  • Auto-scaling: Optional capacity adjustment
  • Use case: Predictable workloads, cost optimization
  • Pricing: $0.09 per RCU/month, $0.47 per WCU/month

Capacity Units

Read Capacity Unit (RCU):
  Strongly Consistent: 1 RCU = 1 read/sec of 4KB item
  Eventually Consistent: 1 RCU = 2 reads/sec of 4KB item
  Transactional: 1 RCU = 1 read/sec of 4KB item (2x cost)

Write Capacity Unit (WCU):
  Standard: 1 WCU = 1 write/sec of 1KB item
  Transactional: 1 WCU = 1 write/sec of 1KB item (2x cost)

🎯 Indexes

Global Secondary Index (GSI)

# Create GSI
table.create_table(
    TableName='Orders',
    KeySchema=[
        {'AttributeName': 'UserID', 'KeyType': 'HASH'},
        {'AttributeName': 'OrderDate', 'KeyType': 'RANGE'}
    ],
    AttributeDefinitions=[
        {'AttributeName': 'UserID', 'AttributeType': 'S'},
        {'AttributeName': 'OrderDate', 'AttributeType': 'S'},
        {'AttributeName': 'Status', 'AttributeType': 'S'}
    ],
    GlobalSecondaryIndexes=[
        {
            'IndexName': 'StatusIndex',
            'KeySchema': [
                {'AttributeName': 'Status', 'KeyType': 'HASH'},
                {'AttributeName': 'OrderDate', 'KeyType': 'RANGE'}
            ],
            'Projection': {'ProjectionType': 'ALL'}
        }
    ]
)

# Query GSI
response = table.query(
    IndexName='StatusIndex',
    KeyConditionExpression=Key('Status').eq('Pending')
)

Local Secondary Index (LSI)

# LSI phải tạo cùng lúc với table
# Same partition key, different sort key
table.create_table(
    TableName='GameScores',
    KeySchema=[
        {'AttributeName': 'UserID', 'KeyType': 'HASH'},
        {'AttributeName': 'GameTitle', 'KeyType': 'RANGE'}
    ],
    LocalSecondaryIndexes=[
        {
            'IndexName': 'TopScoreIndex',
            'KeySchema': [
                {'AttributeName': 'UserID', 'KeyType': 'HASH'},
                {'AttributeName': 'TopScore', 'KeyType': 'RANGE'}
            ],
            'Projection': {'ProjectionType': 'ALL'}
        }
    ]
)

GSI vs LSI Comparison

Feature GSI LSI
Creation Anytime Only at table creation
Partition Key Different Same as table
Sort Key Different Different
Size Limit No limit 10GB per partition
Consistency Eventually consistent Strong/eventual
Capacity Separate Shares with table

🔄 DynamoDB Streams

What are Streams?

  • Change data capture: Record data modifications
  • Real-time: Near real-time processing
  • Ordered: Changes in order per item
  • Retention: 24 hours maximum

Stream Types

Stream View Types:
  KEYS_ONLY: Only key attributes
  NEW_IMAGE: Entire item after modification
  OLD_IMAGE: Entire item before modification
  NEW_AND_OLD_IMAGES: Both before and after

Stream Processing

# Lambda function processing DynamoDB Stream
import json

def lambda_handler(event, context):
    for record in event['Records']:
        event_name = record['eventName']  # INSERT, MODIFY, REMOVE

        if event_name == 'INSERT':
            new_image = record['dynamodb']['NewImage']
            # Process new item

        elif event_name == 'MODIFY':
            old_image = record['dynamodb']['OldImage']
            new_image = record['dynamodb']['NewImage']
            # Process updated item

        elif event_name == 'REMOVE':
            old_image = record['dynamodb']['OldImage']
            # Process deleted item

🔒 Security Features

Encryption

Encryption at Rest:
  Default: AWS owned keys
  KMS: Customer managed keys
  Options: Table-level, index-level

Encryption in Transit:
  TLS: All API calls
  VPC Endpoints: Private connectivity

Access Control

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem"
      ],
      "Resource": "arn:aws:dynamodb:region:account:table/MyTable",
      "Condition": {
        "ForAllValues:StringEquals": {
          "dynamodb:LeadingKeys": ["${aws:userid}"]
        }
      }
    }
  ]
}

Fine-grained Access Control

# Item-level access với Leading Keys
# User chỉ có thể access items với UserID = aws:userid
table.get_item(
    Key={'UserID': 'current-user-id', 'ItemID': 'item123'}
)

📈 Performance Optimization

Hot Partitions

Problem: Uneven data distribution
Causes:
  - Poor partition key choice
  - Time-based access patterns
  - Celebrity/viral content

Solutions:
  - Better partition key design
  - Write sharding
  - Random suffixes

Efficient Access Patterns

# ❌ Poor: Sequential access
partition_key = f"user-{timestamp}"

# ✅ Better: Even distribution
partition_key = f"user-{hash(user_id) % 1000}-{user_id}"

# ❌ Poor: Scan operation
response = table.scan(
    FilterExpression=Attr('Category').eq('Electronics')
)

# ✅ Better: GSI query
response = table.query(
    IndexName='CategoryIndex',
    KeyConditionExpression=Key('Category').eq('Electronics')
)

Batch Operations

# Batch Get (up to 100 items)
response = dynamodb.batch_get_item(
    RequestItems={
        'MyTable': {
            'Keys': [
                {'UserID': 'user1', 'OrderID': 'order1'},
                {'UserID': 'user2', 'OrderID': 'order2'}
            ]
        }
    }
)

# Batch Write (up to 25 items)
with table.batch_writer() as batch:
    for i in range(100):
        batch.put_item(Item={'UserID': f'user{i}', 'Data': f'data{i}'})

💰 Cost Optimization

Cost Factors

Storage:
  Standard: $0.25 per GB/month
  IA (Infrequent Access): $0.10 per GB/month

Requests:
  On-Demand: $0.25 per million reads, $1.25 per million writes
  Provisioned: $0.09 per RCU/month, $0.47 per WCU/month

Data Transfer:
  Same Region: Free
  Cross-Region: $0.02 per GB
  Internet: $0.09 per GB

Cost Optimization Strategies

  1. Table design: Minimize item size
  2. Access patterns: Use queries instead of scans
  3. Capacity mode: Choose based on workload
  4. TTL: Auto-delete expired items
  5. Storage class: Use IA for infrequent access

🔄 Global Tables

Multi-region Replication

Global Table Setup:
  Primary Region: us-east-1
  Replica Regions: 
    - us-west-2
    - eu-west-1
    - ap-south-1

Replication:
  Type: Asynchronous
  Consistency: Eventually consistent
  Conflict Resolution: Last writer wins

Configuration

# Enable Global Tables
dynamodb.create_global_table(
    GlobalTableName='MyGlobalTable',
    ReplicationGroup=[
        {'RegionName': 'us-east-1'},
        {'RegionName': 'us-west-2'},
        {'RegionName': 'eu-west-1'}
    ]
)

🛠️ DynamoDB Tools

AWS CLI

# Create table
aws dynamodb create-table \
    --table-name MyTable \
    --attribute-definitions AttributeName=ID,AttributeType=S \
    --key-schema AttributeName=ID,KeyType=HASH \
    --billing-mode PAY_PER_REQUEST

# Query table
aws dynamodb query \
    --table-name MyTable \
    --key-condition-expression "ID = :id" \
    --expression-attribute-values '{":id":{"S":"user123"}}'

# Import from S3
aws dynamodb import-table \
    --s3-bucket-source Bucket=my-bucket,KeyPrefix=data/ \
    --input-format DYNAMODB_JSON \
    --table-creation-parameters TableName=ImportedTable

NoSQL Workbench

  • Visual design: Table modeling tool
  • Data modeling: Design và visualize tables
  • Sample data: Generate test data
  • Code generation: Generate application code

🧪 Best Practices

Table Design

  1. Single table design: One table per application
  2. Composite keys: Enable hierarchical data
  3. Sparse indexes: GSI with selective attributes
  4. Overloaded GSI: Multiple access patterns
  5. Adjacent items: Store related data together

Performance

  1. Partition key distribution: Avoid hot partitions
  2. Item size: Keep under 400KB
  3. Burst capacity: Handle traffic spikes
  4. Connection pooling: Reuse connections
  5. Compression: Compress large attributes

Cost Management

  1. Monitor usage: Use CloudWatch metrics
  2. Auto-scaling: Set up capacity scaling
  3. Reserved capacity: For predictable workloads
  4. TTL: Auto-expire old data
  5. IA storage: For infrequent access

📝 Exam Tips cho AWS SAA

Key Scenarios

  • NoSQL requirements: Flexible schema, horizontal scaling
  • Real-time applications: Gaming, IoT, mobile apps
  • Serverless architecture: Lambda + DynamoDB
  • High-performance: Sub-10ms latency requirements

Common Patterns

E-commerce:
  Product Catalog: GSI on Category, Brand
  User Orders: Partition by UserID, Sort by OrderDate
  Shopping Cart: TTL for abandoned carts

Gaming:
  Player Profiles: Partition by PlayerID
  Leaderboards: GSI sorted by Score
  Game Sessions: TTL for expired sessions

IoT:
  Device Data: Partition by DeviceID, Sort by Timestamp
  Aggregated Metrics: Time-based partitioning
  Alerts: GSI on Severity

📖 Tóm tắt

DynamoDB là fully managed NoSQL database service cung cấp: - High performance với single-digit millisecond latency - Flexible scaling từ zero đến millions of requests - Multiple consistency models và global distribution - Comprehensive security và compliance features - Cost-effective pricing với on-demand và provisioned options

Hiểu rõ data modeling, access patterns, và performance optimization là essential cho AWS SAA exam.