System Design Interview Preparation

Tổng Quan

System Design Interview đánh giá khả năng thiết kế large-scale distributed systems. Đây là guide toàn diện để chuẩn bị cho system design interviews.

Interview Process Framework

Typical Interview Structure

1. Problem Clarification (5-10 minutes)
   - Understand requirements
   - Ask clarifying questions
   - Define scope

2. High-Level Design (10-15 minutes)
   - Draw basic architecture
   - Identify major components
   - Show data flow

3. Detailed Design (15-20 minutes)
   - Deep dive into components
   - Database design
   - API design

4. Scale & Optimize (10-15 minutes)
   - Identify bottlenecks
   - Discuss scaling strategies
   - Address edge cases

5. Wrap-up (5 minutes)
   - Monitoring & alerting
   - Summary of trade-offs

Question Clarification Framework

class InterviewQuestionFramework:
    def clarify_requirements(self, problem_statement):
        clarifying_questions = {
            'functional_requirements': [
                "What are the core features we need to support?",
                "Who are the primary users?",
                "What platforms do we need to support?"
            ],
            'non_functional_requirements': [
                "How many users do we expect?",
                "What's the expected read/write ratio?",
                "What are the latency requirements?",
                "What's the availability requirement?"
            ],
            'constraints': [
                "Are there any budget constraints?",
                "Do we have any technology preferences?",
                "Are there compliance requirements?"
            ]
        }

        return clarifying_questions

    def define_success_metrics(self, requirements):
        return {
            'performance': ['Response time < 200ms', 'Throughput > 10K QPS'],
            'availability': ['99.9% uptime', 'Zero data loss'],
            'scalability': ['Support 10x growth', 'Handle traffic spikes']
        }

Common Interview Questions

URL Shortener Design

class URLShortenerInterview:
    def approach_problem(self):
        steps = {
            '1_requirements': {
                'functional': ['Shorten URLs', 'Redirect to original', 'Custom aliases'],
                'non_functional': ['100M URLs/month', '100:1 read/write', '<100ms latency'],
                'constraints': ['6-7 character short URLs', '5 year retention']
            },
            '2_capacity_estimation': {
                'storage': '100M URLs/month * 60 months * 500 bytes = 3TB',
                'qps': 'Write: 40/sec, Read: 4000/sec',
                'bandwidth': '4000 QPS * 500 bytes = 2MB/sec'
            },
            '3_system_design': {
                'components': ['Load Balancer', 'Web Servers', 'Database', 'Cache'],
                'database': 'SQL for ACID properties',
                'caching': 'Redis for hot URLs',
                'encoding': 'Base62 encoding'
            }
        }
        return steps

Chat System Design

class ChatSystemInterview:
    def design_approach(self):
        return {
            'architecture': {
                'websockets': 'Real-time bidirectional communication',
                'message_queue': 'Reliable message delivery',
                'presence_service': 'Online/offline status',
                'notification_service': 'Push notifications'
            },
            'database_design': {
                'message_table': 'Partitioned by chat_id and timestamp',
                'user_table': 'User profiles and settings',
                'chat_table': 'Chat metadata and participants'
            },
            'scaling': {
                'horizontal_scaling': 'Multiple WebSocket servers',
                'message_routing': 'Consistent hashing for user assignment',
                'data_partitioning': 'Shard by chat_id or user_id'
            }
        }

Design Patterns để Remember

Scalability Patterns

class ScalabilityPatterns:
    def __init__(self):
        self.patterns = {
            'horizontal_scaling': {
                'description': 'Add more servers to handle increased load',
                'when_to_use': 'When single server reaches capacity limits',
                'example': 'Load balancer distributing traffic across multiple web servers'
            },
            'caching': {
                'description': 'Store frequently accessed data in fast storage',
                'types': ['Browser cache', 'CDN', 'Application cache', 'Database cache'],
                'example': 'Redis caching database query results'
            },
            'database_sharding': {
                'description': 'Partition data across multiple databases',
                'strategies': ['Horizontal partitioning', 'Vertical partitioning', 'Functional partitioning'],
                'example': 'User data sharded by user_id % num_shards'
            },
            'microservices': {
                'description': 'Break monolith into smaller, independent services',
                'benefits': ['Independent scaling', 'Technology diversity', 'Fault isolation'],
                'challenges': ['Network latency', 'Data consistency', 'Service discovery']
            }
        }

Reliability Patterns

class ReliabilityPatterns:
    def __init__(self):
        self.patterns = {
            'circuit_breaker': {
                'purpose': 'Prevent cascading failures',
                'implementation': 'Monitor failure rate, open circuit when threshold exceeded',
                'states': ['Closed', 'Open', 'Half-Open']
            },
            'retry_with_backoff': {
                'purpose': 'Handle transient failures',
                'strategy': 'Exponential backoff with jitter',
                'max_retries': 3
            },
            'bulkhead_pattern': {
                'purpose': 'Isolate critical resources',
                'example': 'Separate connection pools for different services'
            },
            'graceful_degradation': {
                'purpose': 'Maintain core functionality during partial failures',
                'example': 'Show cached results when recommendation service is down'
            }
        }

Database Selection Guidelines

SQL vs NoSQL Decision Framework

class DatabaseSelectionGuide:
    def choose_database(self, requirements):
        if requirements.needs_acid_transactions:
            return self.recommend_sql_database(requirements)
        elif requirements.needs_horizontal_scaling:
            return self.recommend_nosql_database(requirements)
        else:
            return self.analyze_data_model(requirements)

    def recommend_sql_database(self, requirements):
        recommendations = {
            'postgresql': 'Complex queries, JSON support, ACID transactions',
            'mysql': 'Simple operations, proven reliability, wide adoption',
            'oracle': 'Enterprise features, complex analytics'
        }
        return recommendations

    def recommend_nosql_database(self, requirements):
        recommendations = {
            'mongodb': 'Document-based, flexible schema, rich queries',
            'cassandra': 'High write throughput, wide column store',
            'redis': 'In-memory, high performance, caching',
            'dynamodb': 'Serverless, managed, predictable performance'
        }
        return recommendations

Performance Calculation Templates

Back-of-Envelope Calculations

class PerformanceCalculations:
    def __init__(self):
        self.latency_numbers = {
            'l1_cache': 1,           # 1 ns
            'l2_cache': 10,          # 10 ns
            'ram': 100,              # 100 ns
            'ssd': 100_000,          # 0.1 ms
            'network_same_dc': 1_000_000,      # 1 ms
            'disk': 10_000_000,      # 10 ms
            'network_cross_continent': 100_000_000  # 100 ms
        }

        self.throughput_numbers = {
            'ethernet_1gb': 125_000_000,        # 125 MB/s
            'ssd_sequential': 500_000_000,      # 500 MB/s
            'memory_bandwidth': 10_000_000_000, # 10 GB/s
        }

    def estimate_qps_capacity(self, avg_response_time_ms, server_count):
        # Assuming each server can handle 1000 concurrent connections
        max_concurrent = server_count * 1000
        qps_per_server = 1000 / (avg_response_time_ms / 1000)
        total_qps = qps_per_server * server_count
        return total_qps

    def estimate_storage_requirements(self, daily_data_gb, retention_days, replication_factor):
        total_storage = daily_data_gb * retention_days * replication_factor
        return total_storage

Common Mistakes để Avoid

Design Mistakes

class CommonInterviewMistakes:
    def __init__(self):
        self.mistakes_to_avoid = {
            'requirements_phase': [
                "Not asking clarifying questions",
                "Making assumptions without validation",
                "Jumping into design too quickly"
            ],
            'design_phase': [
                "Over-engineering the solution",
                "Not considering trade-offs",
                "Forgetting about data consistency",
                "Ignoring failure scenarios"
            ],
            'scaling_phase': [
                "Not identifying bottlenecks",
                "Premature optimization",
                "Not considering cost implications"
            ],
            'communication': [
                "Not explaining thought process",
                "Not engaging with interviewer",
                "Being too quiet or too verbose"
            ]
        }

Practice Problems

Beginner Level

1. Design a URL shortener (like bit.ly)
2. Design a simple chat application
3. Design a basic social media news feed
4. Design a file storage system (like Dropbox)
5. Design a basic search engine

Intermediate Level

1. Design Instagram
2. Design Uber/Lyft
3. Design Netflix
4. Design WhatsApp
5. Design Twitter

Advanced Level

1. Design Google Search
2. Design Amazon
3. Design YouTube
4. Design Facebook
5. Design distributed cache system

Interview Tips

Before the Interview

1. Review fundamental concepts
   - Scalability patterns
   - Database concepts
   - Caching strategies
   - Load balancing

2. Practice with timing
   - 45-60 minute mock interviews
   - Focus on time management
   - Practice drawing diagrams

3. Study real systems
   - Read engineering blogs
   - Understand how large systems work
   - Learn from case studies

During the Interview

1. Start with clarification
   - Ask good questions
   - Define scope clearly
   - Confirm understanding

2. Think out loud
   - Explain your reasoning
   - Discuss trade-offs
   - Engage with interviewer

3. Be systematic
   - Follow structured approach
   - Start simple, then add complexity
   - Consider all requirements

4. Handle feedback gracefully
   - Listen to hints
   - Adapt your design
   - Show flexibility

Communication Framework

class InterviewCommunication:
    def structure_explanation(self, component):
        return {
            'what': f"This is {component.name}",
            'why': f"We need this because {component.purpose}",
            'how': f"It works by {component.mechanism}",
            'alternatives': f"We could also use {component.alternatives}",
            'trade_offs': f"The trade-offs are {component.pros_and_cons}"
        }

Resources để Study

Books

"Designing Data-Intensive Applications" by Martin Kleppmann
"System Design Interview" by Alex Xu
"Building Microservices" by Sam Newman

Online Resources

High Scalability blog
AWS Architecture Center
Google Cloud Architecture Framework
System design primer on GitHub

Practice Platforms

Pramp system design interviews
InterviewBit system design
Grokking the System Design Interview

Next Steps

Nội dung này sẽ được mở rộng thêm với: - More detailed case study walkthroughs - Advanced scaling scenarios - Real interview examples và feedback - Industry-specific design patterns - Mock interview templates