Scalability Principles - Building Systems that Grow

Tổng quan

Scalability là khả năng của hệ thống có thể handle increased load bằng cách thêm resources vào system. Đây là fundamental principle trong system design, đặc biệt quan trọng khi building applications cho millions of users.

Định nghĩa Scalability

Scalability = Ability to handle increased load gracefully

Load có thể là:

  • Users: Increased concurrent users
  • Data: Growing database size
  • Requests: More API calls per second
  • Transactions: Higher transaction volume
  • Storage: Increased storage requirements

Types of Scaling

🔺 Vertical Scaling (Scale Up)

Thêm power vào existing machine bằng cách upgrade hardware.

Before Scaling:           After Scaling:
┌─────────────┐          ┌─────────────┐
│   4 cores   │          │  16 cores   │
│   8 GB RAM  │    →     │  64 GB RAM  │
│  100 GB SSD │          │   1 TB SSD  │
└─────────────┘          └─────────────┘
    Single Server           Bigger Server

Advantages:

  • Simple: No code changes required
  • No complexity: Single machine to manage
  • Data consistency: No distributed system issues
  • ACID compliance: Database ACID properties maintained

Disadvantages:

  • Hardware limits: CPU, RAM có giới hạn vật lý
  • Single point of failure: Nếu machine fail, entire system down
  • Expensive: High-end hardware costs exponentially more
  • Downtime: Upgrades require system shutdown

When to use:

  • Legacy applications không thể distributed
  • Small to medium applications
  • When development team lacks distributed systems expertise
  • Strict consistency requirements

↔️ Horizontal Scaling (Scale Out)

Thêm more machines vào pool of resources.

Before Scaling:           After Scaling:
┌─────────────┐          ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│             │          │     │ │     │ │     │ │     │
│ Single      │    →     │ Web │ │ Web │ │ Web │ │ Web │
│ Server      │          │ Srv │ │ Srv │ │ Srv │ │ Srv │
│             │          │  1  │ │  2  │ │  3  │ │  4  │
└─────────────┘          └─────┘ └─────┘ └─────┘ └─────┘
    One Machine            Multiple Machines

Advantages:

  • No limits: Theoretically unlimited scaling
  • Fault tolerant: One machine fails, others continue
  • Cost effective: Commodity hardware is cheaper
  • Incremental: Add machines as needed

Disadvantages:

  • Complexity: Distributed system challenges
  • Data consistency: CAP theorem limitations
  • Code changes: Applications cần designed for distribution
  • Network overhead: Inter-machine communication

When to use:

  • Large-scale applications (millions of users)
  • When high availability is critical
  • Cost optimization is important
  • Team has distributed systems expertise

Scalability Patterns

1. Load Distribution Patterns

Load Balancing

                    ┌─────────────┐
     Requests   ───▶│Load Balancer│
                    └─────┬───────┘
                          │
       ┌──────────────────┼──────────────────┐
       │                  │                  │
       ▼                  ▼                  ▼
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Server 1  │    │   Server 2  │    │   Server 3  │
└─────────────┘    └─────────────┘    └─────────────┘

Algorithms: - Round Robin: Requests distributed evenly - Least Connections: Send to server with fewest active connections - Weighted: Different servers handle different load percentages - IP Hash: Route based on client IP

Horizontal Partitioning (Sharding)

Users Database:

Shard 1 (Users A-H):     Shard 2 (Users I-P):     Shard 3 (Users Q-Z):
┌─────────────┐          ┌─────────────┐          ┌─────────────┐
│ Alice       │          │ John        │          │ Robert      │
│ Bob         │          │ Kate        │          │ Sarah       │
│ Charlie     │          │ Lisa        │          │ Tom         │
│ David       │          │ Mike        │          │ Victor      │
└─────────────┘          └─────────────┘          └─────────────┘

Sharding Strategies: - Range-based: Partition by value ranges - Hash-based: Use hash function cho distribution - Directory-based: Lookup service maps keys to shards - Geographic: Partition by location

2. Caching Patterns

Cache Hierarchy

Client Request Flow:
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Browser   │───▶│     CDN     │───▶│ App Server  │───▶│  Database   │
│   Cache     │    │   Cache     │    │   Cache     │    │             │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
   Fastest            Fast               Medium             Slowest

Cache Types: - Browser Cache: Static assets (CSS, JS, images) - CDN Cache: Geographic distribution - Application Cache: In-memory caching (Redis, Memcached) - Database Cache: Query result caching

Cache Strategies:

# Cache-Aside (Lazy Loading)
def get_user(user_id):
    # Try cache first
    user = cache.get(f"user:{user_id}")
    if user is None:
        # Cache miss - get from database
        user = database.get_user(user_id)
        # Store in cache for future
        cache.set(f"user:{user_id}", user, ttl=3600)
    return user

# Write-Through
def update_user(user_id, user_data):
    # Update database
    database.update_user(user_id, user_data)
    # Update cache immediately
    cache.set(f"user:{user_id}", user_data, ttl=3600)

# Write-Behind (Write-Back)
def update_user(user_id, user_data):
    # Update cache immediately
    cache.set(f"user:{user_id}", user_data, ttl=3600)
    # Database update happens asynchronously
    async_queue.add_task('update_db', user_id, user_data)

3. Database Scaling Patterns

Read Replicas

                    ┌─────────────┐
    Write Requests ─│   Master    │
                    │  Database   │
                    └─────┬───────┘
                          │ Replication
              ┌───────────┼───────────┐
              │           │           │
              ▼           ▼           ▼
      ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
 Read │   Replica 1 │ │   Replica 2 │ │   Replica 3 │
 ───▶ │  Database   │ │  Database   │ │  Database   │
      └─────────────┘ └─────────────┘ └─────────────┘

Benefits: - Scale read operations - Reduce load on master database - Geographic distribution possible - Backup/disaster recovery

Database Partitioning

# Vertical Partitioning (by columns)
Users Table Split:
┌─────────────┐    ┌─────────────┐
│ user_id     │    │ user_id     │
│ username    │    │ email       │
│ first_name  │    │ phone       │
│ last_name   │    │ address     │
└─────────────┘    └─────────────┘
Basic Info Table   Contact Table

# Horizontal Partitioning (by rows)
Users Table Split:
┌─────────────┐    ┌─────────────┐
│ Users 1-1M  │    │ Users 1M-2M │
│ user_id     │    │ user_id     │
│ username    │    │ username    │
│ email       │    │ email       │
└─────────────┘    └─────────────┘
Partition 1         Partition 2

Scalability Metrics

Performance Metrics

Latency vs Throughput

Latency = Time to process single request
Throughput = Number of requests processed per time unit

Example:
- Latency: 100ms per request
- Throughput: 1000 requests/second

Higher throughput often means higher latency!

Response Time Percentiles

Response Time Distribution:
- p50 (median): 100ms  (50% of requests < 100ms)
- p95: 200ms           (95% of requests < 200ms)  
- p99: 500ms           (99% of requests < 500ms)
- p99.9: 1000ms        (99.9% of requests < 1000ms)

Focus on p95, p99 rather than average!

Availability Metrics

Uptime Calculation:
99.9% availability = 8.76 hours downtime/year
99.99% availability = 52.56 minutes downtime/year
99.999% availability = 5.26 minutes downtime/year

Nines Table:
90% = 36.5 days/year downtime
99% = 3.65 days/year downtime  
99.9% = 8.76 hours/year downtime
99.99% = 52.56 minutes/year downtime
99.999% = 5.26 minutes/year downtime

Real-World Scaling Examples

Twitter - Tweet Timeline

Problem: Generate timeline for 300M users
Solution: Multiple scaling approaches

1. Pull Model (Early days):
   User requests → Generate timeline from following list
   Issues: Slow for users with many followings

2. Push Model (Current):
   Tweet created → Push to all followers' pre-computed timelines
   Issues: Celebrities with millions of followers

3. Hybrid Model:
   Normal users: Push model
   Celebrities: Pull model on-demand
   Optimization based on follower count

Netflix - Video Streaming

Scaling Challenges:
- 200M+ subscribers globally
- 15 billion hours watched/month
- 4K video requires 25 Mbps bandwidth

Solutions:
1. CDN Network:
   - Content cached in 1000+ locations globally
   - 95% of traffic served from edge locations

2. Microservices:
   - 700+ microservices
   - Independent scaling per service

3. Auto-scaling:
   - Scale based on real-time demand
   - Predictive scaling for popular content releases

Instagram - Image Sharing

Scale: 1 billion users, 95 million photos/day

Database Sharding Strategy:
- Shard by user_id using consistent hashing
- 4000 database shards
- Each shard handles ~250K users

Photo Storage:
- Original photos: Amazon S3
- Processed thumbnails: CDN
- Metadata: Cassandra database

Scaling Antipatterns

Premature Optimization

Wrong: "Let's use microservices from day 1!"
Right: Start với monolith, extract services when needed

Over-Engineering

Wrong: Build for 1M users when you have 100 users
Right: Scale incrementally based on actual demand

Ignoring Bottlenecks

Wrong: Add more web servers when database is bottleneck
Right: Identify và address actual bottlenecks first

No Monitoring

Wrong: Scale without understanding current performance
Right: Monitor metrics before making scaling decisions

Best Practices

1. Design for Scaling

  • ✅ Stateless applications
  • ✅ Database connection pooling
  • ✅ Asynchronous processing
  • ✅ Circuit breaker pattern

2. Monitor Everything

  • ✅ Application metrics
  • ✅ Infrastructure metrics
  • ✅ Business metrics
  • ✅ User experience metrics

3. Scale Incrementally

  • ✅ Start simple, scale when needed
  • ✅ Measure before optimizing
  • ✅ Test scaling strategies
  • ✅ Plan for peak traffic

4. Prepare for Failure

  • ✅ Design for fault tolerance
  • ✅ Graceful degradation
  • ✅ Circuit breakers
  • ✅ Chaos engineering

Next Steps

  1. 📚 Study Reliability & Availability - CAP theorem, ACID properties
  2. 🎯 Learn Performance Optimization - Latency, throughput metrics
  3. 🏗️ Practice Capacity Planning - Estimation calculations
  4. 💻 Implement Caching - Redis, CDN strategies
  5. 📖 Explore Database Scaling - Sharding, read replicas

"Scalability is not about building for millions of users day one. It's about building systems that can evolve và grow gracefully as demand increases."