Media Streaming Platform - AWS Architecture

🎬 Business Overview

Platform Requirements

  • Video streaming: Support millions of concurrent users
  • Content delivery: Global reach với low latency
  • Live streaming: Real-time events và sports
  • User management: Subscription và authentication
  • Analytics: Viewer behavior và content performance
  • Mobile/Web: Cross-platform compatibility

Scale Requirements

  • Peak concurrent users: 5 million globally
  • Video library: 100,000+ hours của content
  • Upload volume: 1000+ hours/day của new content
  • Global regions: Support cho 50+ countries
  • Availability: 99.99% uptime requirement

🏗️ High-Level Architecture

                    Global Users
                        |
            [CloudFront CDN with Edge Locations]
                        |
        ┌───────────────┼───────────────┐
        |               |               |
   [Route 53]      [API Gateway]    [MediaLive]
        |               |               |
   [WAF/Shield]    [Lambda@Edge]   [MediaPackage]
        |               |               |
    [ALB/NLB]      [Microservices]  [MediaStore]
        |               |               |
   [ECS Fargate]   [DynamoDB/RDS]    [S3 Storage]
        |               |               |
    [ElastiCache]   [Elasticsearch]  [Glacier Archive]

🎥 Content Ingestion & Processing

Video Upload Pipeline

Content Ingestion Flow:
  1. Content Creator Upload:
     - S3 Transfer Acceleration
     - Large file multipart upload
     - Pre-signed URLs for security

  2. MediaConvert Processing:
     - Multiple resolutions (360p to 4K)
     - Adaptive bitrate streaming
     - Thumbnail generation
     - Subtitle extraction

  3. Content Validation:
     - Lambda functions for metadata
     - AI/ML content moderation
     - Quality assurance checks

  4. Storage Distribution:
     - S3 for master files
     - CloudFront origin for delivery
     - DynamoDB for metadata

MediaConvert Configuration

{
  "Role": "arn:aws:iam::account:role/MediaConvertRole",
  "Settings": {
    "OutputGroups": [
      {
        "Name": "HLS",
        "OutputGroupSettings": {
          "Type": "HLS_GROUP_SETTINGS",
          "HlsGroupSettings": {
            "Destination": "s3://video-output-bucket/hls/",
            "SegmentLength": 6,
            "MinSegmentLength": 1
          }
        },
        "Outputs": [
          {
            "VideoDescription": {
              "CodecSettings": {
                "Codec": "H_264",
                "H264Settings": {
                  "Bitrate": 5000000,
                  "FramerateControl": "SPECIFIED",
                  "FramerateNumerator": 30,
                  "FramerateDenominator": 1
                }
              }
            },
            "AudioDescriptions": [
              {
                "CodecSettings": {
                  "Codec": "AAC",
                  "AacSettings": {
                    "Bitrate": 128000,
                    "SampleRate": 48000
                  }
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

📡 Live Streaming Architecture

Live Event Pipeline

Live Streaming Flow:
  1. Input Sources:
     - RTMP/RTP streams
     - SDI/HDMI cameras
     - Mobile apps

  2. MediaLive Processing:
     - Real-time encoding
     - Multi-bitrate outputs
     - Redundant pipelines

  3. MediaPackage Delivery:
     - Just-in-time packaging
     - DRM protection
     - DVR functionality

  4. CloudFront Distribution:
     - Global edge caching
     - Viewer geolocation
     - Real-time metrics

MediaLive Channel Configuration

{
  "Name": "LiveSportsChannel",
  "InputSpecification": {
    "Codec": "AVC",
    "MaximumBitrate": "MAX_20_MBPS",
    "Resolution": "HD"
  },
  "EncoderSettings": {
    "VideoDescriptions": [
      {
        "Name": "HD1080",
        "CodecSettings": {
          "H264Settings": {
            "Bitrate": 5000000,
            "FramerateControl": "SPECIFIED",
            "FramerateNumerator": 30,
            "FramerateDenominator": 1
          }
        },
        "Height": 1080,
        "Width": 1920
      },
      {
        "Name": "HD720",
        "CodecSettings": {
          "H264Settings": {
            "Bitrate": 3000000,
            "FramerateControl": "SPECIFIED",
            "FramerateNumerator": 30,
            "FramerateDenominator": 1
          }
        },
        "Height": 720,
        "Width": 1280
      }
    ],
    "OutputGroups": [
      {
        "Name": "HLSOutput",
        "OutputGroupSettings": {
          "HlsGroupSettings": {
            "Destination": {
              "DestinationRefId": "hlsOutput"
            },
            "SegmentLength": 6,
            "PlaylistType": "EVENT"
          }
        }
      }
    ]
  }
}

🌐 Global Content Delivery

CloudFront Configuration

CloudFront Distribution:
  Origins:
    - S3 Bucket: Static video files
    - MediaPackage: Live streams
    - ALB: API endpoints

  Behaviors:
    /api/*: 
      - Origin: ALB
      - Cache: None
      - Compress: Yes

    /live/*:
      - Origin: MediaPackage
      - Cache: Custom (30 seconds)
      - Viewer Protocol: HTTPS only

    /videos/*:
      - Origin: S3
      - Cache: 1 year
      - Compress: Yes
      - Origin Shield: Yes

  Security:
    - WAF Integration
    - Signed URLs for premium content
    - Geo-restriction for licensing

Edge Computing với Lambda@Edge

// Viewer request function
exports.handler = (event, context, callback) => {
    const request = event.Records[0].cf.request;
    const headers = request.headers;

    // Device detection
    const userAgent = headers['user-agent'][0].value;
    const isMobile = /Mobile|Android|iPhone/.test(userAgent);

    // Geographic content restriction
    const country = headers['cloudfront-viewer-country'][0].value;
    const restrictedCountries = ['CN', 'RU', 'IR'];

    if (restrictedCountries.includes(country)) {
        const response = {
            status: '403',
            statusDescription: 'Forbidden',
            body: 'Content not available in your region'
        };
        callback(null, response);
        return;
    }

    // Adaptive URL rewriting
    if (isMobile && request.uri.includes('/video/')) {
        request.uri = request.uri.replace('/video/', '/video/mobile/');
    }

    callback(null, request);
};

// Origin response function for caching
exports.handler = (event, context, callback) => {
    const response = event.Records[0].cf.response;
    const headers = response.headers;

    // Add custom cache headers
    headers['cache-control'] = [
        {
            key: 'Cache-Control',
            value: 'public, max-age=86400, s-maxage=31536000'
        }
    ];

    // Add security headers
    headers['strict-transport-security'] = [
        {
            key: 'Strict-Transport-Security',
            value: 'max-age=31536000; includeSubdomains'
        }
    ];

    callback(null, response);
};

👤 User Management & Authentication

Cognito User Pool Configuration

User Authentication:
  Cognito User Pool:
    - Email/Username login
    - MFA support
    - Social identity providers (Google, Facebook)
    - Custom attributes (subscription_tier, preferences)

  User Journey:
    1. Registration/Login → Cognito
    2. JWT Token → API Gateway
    3. Authorization → Lambda
    4. Content Access → Signed URLs

  Subscription Management:
    - Free tier: Ads supported
    - Premium tier: Ad-free, 4K content
    - Family tier: Multiple profiles

API Gateway với JWT Authorization

API Gateway Configuration:
  Authorizers:
    - Cognito User Pool
    - Custom Lambda authorizer

  Resources:
    /users:
      - GET: User profile
      - PUT: Update preferences
      - POST: Subscription management

    /content:
      - GET: Content catalog
      - POST: Search và recommendations

    /streaming:
      - GET: Streaming URLs
      - POST: Playback analytics

📊 Data Architecture

Database Strategy

DynamoDB Tables:
  Users:
    PK: UserID
    Attributes: profile, subscription, preferences
    GSI: email, subscription_tier

  Content:
    PK: ContentID
    SK: Version
    Attributes: metadata, encoding_status, analytics
    GSI: genre, release_date, popularity

  ViewingSessions:
    PK: UserID
    SK: SessionTimestamp
    Attributes: content_id, duration, quality, location
    TTL: 90 days

  Recommendations:
    PK: UserID
    SK: ContentID
    Attributes: score, generated_at, viewed
    TTL: 30 days

RDS (PostgreSQL):
  Content Management:
    - Content metadata
    - User subscriptions
    - Financial transactions
    - Reporting và analytics

  Read Replicas:
    - Analytics queries
    - Reporting dashboards
    - Data warehouse ETL

Real-time Analytics

Kinesis Data Streams:
  Player Events:
    - Play/pause/seek events
    - Quality changes
    - Buffering metrics
    - Error tracking

  Processors:
    - Kinesis Analytics: Real-time metrics
    - Lambda: Event processing
    - Elasticsearch: Log aggregation
    - Redshift: Data warehousing

🤖 AI/ML Integration

Content Recommendation Engine

Recommendation Pipeline:
  Data Sources:
    - Viewing history (DynamoDB)
    - Content metadata (RDS)
    - User preferences (Cognito)
    - Real-time events (Kinesis)

  ML Models:
    - SageMaker: Collaborative filtering
    - Personalize: Real-time recommendations
    - Comprehend: Content categorization
    - Rekognition: Video analysis

  Recommendation Types:
    - Trending content
    - Similar content
    - Personalized picks
    - Continue watching

Content Moderation

Automated Moderation:
  Rekognition Video:
    - Explicit content detection
    - Violence và unsafe content
    - Celebrity recognition
    - Text in video analysis

  Transcribe + Comprehend:
    - Audio transcription
    - Sentiment analysis
    - Inappropriate language detection
    - Content categorization

  Custom Models:
    - Brand-specific rules
    - Cultural sensitivity
    - Age-appropriate content
    - Copyright detection

📈 Monitoring & Analytics

Business Intelligence Dashboard

CloudWatch Dashboards:
  Real-time Metrics:
    - Concurrent viewers
    - Stream quality metrics
    - Geographic distribution
    - Device breakdown

  Business KPIs:
    - New subscriptions
    - Churn rate
    - Content popularity
    - Revenue metrics

QuickSight Analytics:
  - Executive dashboards
  - Content performance reports
  - User engagement analysis
  - Financial reporting

Performance Monitoring

Application Monitoring:
  - API Gateway metrics
  - Lambda function performance
  - DynamoDB throttling
  - MediaLive stream health

Player Analytics:
  - Startup time
  - Buffering ratio
  - Video quality distribution
  - Error rates by region

Cost Monitoring:
  - CloudFront bandwidth costs
  - MediaConvert processing costs
  - Storage costs by tier
  - Compute costs optimization

🔒 Security & DRM

Content Protection

DRM Implementation:
  PlayReady: Microsoft ecosystem
  Widevine: Google/Android
  FairPlay: Apple ecosystem

  Key Management:
    - AWS KMS for encryption keys
    - Secure key rotation
    - Multi-tenancy support

  License Server:
    - Custom Lambda functions
    - Integration với DRM providers
    - User entitlement validation

Security Best Practices

Network Security:
  - VPC với private subnets
  - Security groups restrictive rules
  - WAF for application protection
  - Shield Advanced for DDoS

Application Security:
  - API rate limiting
  - Input validation
  - Signed URLs for content
  - JWT token validation

Data Protection:
  - Encryption at rest (S3, RDS, DynamoDB)
  - Encryption in transit (TLS/SSL)
  - Field-level encryption
  - PII data anonymization

💰 Cost Optimization

Storage Optimization

S3 Lifecycle Policies:
  - Standard: Active content (0-30 days)
  - IA: Less popular content (30-90 days)
  - Glacier: Archive content (90+ days)
  - Deep Archive: Long-term storage

Content Optimization:
  - Intelligent tiering
  - Regional replication strategy
  - Compression optimization
  - Duplicate content detection

Compute Cost Management

ECS/Fargate Optimization:
  - Spot instances for batch processing
  - Reserved capacity for predictable workloads
  - Auto-scaling based on demand
  - Right-sizing containers

Lambda Optimization:
  - Memory optimization
  - Provisioned concurrency for critical functions
  - Step Functions for orchestration
  - EventBridge for event routing

🚀 Scalability Patterns

Auto-scaling Strategy

Application Scaling:
  ECS Services:
    - CPU/Memory based scaling
    - Custom metrics (concurrent users)
    - Scheduled scaling for events

  Database Scaling:
    - DynamoDB on-demand mode
    - RDS read replicas
    - ElastiCache cluster scaling

  CDN Scaling:
    - CloudFront automatic scaling
    - Origin Shield optimization
    - Edge location utilization

Global Expansion

Multi-Region Strategy:
  Primary Region: us-east-1
  Secondary Regions: eu-west-1, ap-southeast-1

  Content Strategy:
    - Global content replication
    - Regional content libraries
    - Localized recommendations
    - Compliance với local regulations

📖 Key Takeaways

Architecture Principles

  1. Microservices: Independent scaling và deployment
  2. Event-driven: Asynchronous processing
  3. CDN-first: Global content delivery optimization
  4. Serverless: Reduced operational overhead
  5. Multi-region: Global availability và compliance

Technical Decisions

  • Storage: S3 với intelligent tiering cho cost optimization
  • Compute: Mix của ECS Fargate và Lambda cho different workloads
  • Database: DynamoDB cho scale, RDS cho complex queries
  • CDN: CloudFront với edge computing capabilities
  • Streaming: AWS Media Services cho professional-grade video

Platform này demonstrates comprehensive use của AWS services cho building scalable, global media streaming solution với enterprise-grade features.