MySQL - Câu Hỏi Phỏng Vấn

📝 Basic Level (Junior Developer)

1. SQL Fundamentals

Q: Sự khác biệt giữa INNER JOIN và LEFT JOIN là gì?

-- INNER JOIN: Chỉ trả về rows có match ở cả 2 tables
SELECT u.name, o.order_id 
FROM users u 
INNER JOIN orders o ON u.id = o.user_id;

-- LEFT JOIN: Trả về tất cả rows từ left table, NULL nếu không match
SELECT u.name, o.order_id 
FROM users u 
LEFT JOIN orders o ON u.id = o.user_id;

Q: Primary Key và Foreign Key khác nhau như thế nào? - Primary Key: Unique identifier cho mỗi row, không null, chỉ có 1 per table - Foreign Key: Reference tới primary key của table khác, maintain referential integrity

Q: Giải thích các data types trong MySQL?

-- Numeric
INT(11)         -- 4 bytes integer
DECIMAL(10,2)   -- Exact decimal with precision
FLOAT           -- Approximate decimal

-- String  
VARCHAR(255)    -- Variable length string
CHAR(10)        -- Fixed length string
TEXT            -- Large text data

-- Date/Time
DATE            -- YYYY-MM-DD
DATETIME        -- YYYY-MM-DD HH:MM:SS
TIMESTAMP       -- Auto-updating timestamp

2. Indexing Basics

Q: Index là gì và tại sao cần thiết? - Definition: Cấu trúc dữ liệu giúp tăng tốc độ truy vấn - Benefits: Faster SELECT, WHERE, ORDER BY - Drawbacks: Slower INSERT/UPDATE/DELETE, extra storage

Q: Tạo index như thế nào?

-- Single column index
CREATE INDEX idx_email ON users(email);

-- Composite index
CREATE INDEX idx_name_age ON users(last_name, first_name, age);

-- Unique index
CREATE UNIQUE INDEX idx_username ON users(username);

3. Basic Queries

Q: Viết query tìm top 5 customers có nhiều orders nhất?

SELECT 
    c.customer_name,
    COUNT(o.order_id) as total_orders
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
ORDER BY total_orders DESC
LIMIT 5;

🔧 Intermediate Level (Mid-level Developer)

1. Performance Optimization

Q: Làm thế nào để optimize một slow query?

-- 1. Analyze query execution
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

-- 2. Add appropriate indexes
CREATE INDEX idx_email ON users(email);

-- 3. Rewrite query if needed
-- Before: SELECT * FROM users WHERE YEAR(created_at) = 2023
-- After: SELECT * FROM users WHERE created_at >= '2023-01-01' AND created_at < '2024-01-01'

-- 4. Use LIMIT when appropriate
SELECT * FROM large_table WHERE condition LIMIT 100;

Q: Giải thích query execution plan?

EXPLAIN FORMAT=JSON SELECT * FROM users u 
JOIN orders o ON u.id = o.user_id 
WHERE u.status = 'active';

-- Key fields to check:
-- - type: ALL (bad), range (good), const (best)
-- - possible_keys: Available indexes
-- - key: Actually used index
-- - rows: Estimated rows examined
-- - Extra: Additional information

2. Transactions & Isolation

Q: ACID properties là gì? - Atomicity: All or nothing - transaction hoàn toàn thành công hoặc rollback - Consistency: Database luôn ở valid state - Isolation: Concurrent transactions không ảnh hưởng lẫn nhau - Durability: Committed changes persist sau system failure

Q: Isolation levels trong MySQL?

-- READ UNCOMMITTED - Dirty reads possible
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

-- READ COMMITTED - No dirty reads
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- REPEATABLE READ (Default) - No dirty reads, no non-repeatable reads
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- SERIALIZABLE - Highest isolation
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

3. Storage Engines

Q: Sự khác biệt giữa InnoDB và MyISAM?

Feature InnoDB MyISAM
Transactions Yes No
Foreign Keys Yes No
Locking Row-level Table-level
Crash Recovery Yes No
ACID Yes No
Performance Better for writes Better for reads

Q: Khi nào sử dụng Memory storage engine?

CREATE TABLE session_data (
    session_id VARCHAR(128) PRIMARY KEY,
    data TEXT,
    last_access TIMESTAMP
) ENGINE=MEMORY;

-- Use cases:
-- - Temporary tables
-- - Cache data
-- - Session storage
-- - Lookup tables

🚀 Advanced Level (Senior Developer/DBA)

1. High Availability & Replication

Q: Giải thích MySQL replication workflow?

-- Master logs changes to binary log
-- Slave connects and requests binary log events
-- Slave applies changes to relay log
-- Slave executes relay log events

-- Setup replication:
-- On Master:
CREATE USER 'repl'@'slave_ip' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'slave_ip';

-- On Slave:
CHANGE MASTER TO
    MASTER_HOST='master_ip',
    MASTER_USER='repl',
    MASTER_PASSWORD='password',
    MASTER_LOG_FILE='mysql-bin.000001',
    MASTER_LOG_POS=4;
START SLAVE;

Q: Làm thế nào handle replication lag?

-- Monitor replication lag
SHOW SLAVE STATUS\G
-- Check Seconds_Behind_Master

-- Solutions:
-- 1. Parallel replication
SET GLOBAL slave_parallel_workers = 4;
SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK';

-- 2. Read from master for critical data
-- 3. Use semi-synchronous replication
-- 4. Optimize slow queries on slave

2. Performance Tuning

Q: Key MySQL configuration parameters cần tune?

[mysqld]
# Memory settings
innodb_buffer_pool_size = 1G          # 70-80% of RAM
innodb_log_file_size = 256M            # 25% of buffer pool
innodb_log_buffer_size = 16M           

# Connection settings
max_connections = 200                  # Based on workload
thread_cache_size = 8                  # Reduce thread creation

# Query cache (deprecated in 8.0)
query_cache_type = ON
query_cache_size = 32M

# InnoDB settings
innodb_flush_log_at_trx_commit = 1     # ACID compliance
innodb_file_per_table = ON             # Better space management

Q: Cách monitor MySQL performance?

-- Key metrics to monitor
SHOW GLOBAL STATUS LIKE 'Questions';          -- QPS
SHOW GLOBAL STATUS LIKE 'Threads_connected';  -- Active connections
SHOW GLOBAL STATUS LIKE 'Slow_queries';       -- Slow query count

-- Buffer pool efficiency
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads';
-- Hit ratio = (requests - reads) / requests * 100

-- Performance Schema queries
SELECT * FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC LIMIT 10;

3. Scaling Strategies

Q: Database sharding strategies?

# Range-based sharding
def get_shard_by_range(user_id):
    if user_id <= 1000000:
        return 'shard1'
    elif user_id <= 2000000:
        return 'shard2'
    else:
        return 'shard3'

# Hash-based sharding  
def get_shard_by_hash(user_id):
    return f'shard{user_id % 4}'

# Directory-based sharding
# Lookup table maps entity_id -> shard_name

Q: Vertical vs Horizontal partitioning? - Vertical: Split table by columns (normalization) - Horizontal: Split table by rows (sharding/partitioning)

-- Horizontal partitioning by range
CREATE TABLE orders (
    id INT,
    order_date DATE,
    customer_id INT
) PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);

🔍 Troubleshooting Scenarios

Scenario 1: Slow Query Performance

Q: Application suddenly slow, how to diagnose?

-- 1. Check current running queries
SHOW PROCESSLIST;

-- 2. Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

-- 3. Analyze slow queries
-- Use pt-query-digest or mysqldumpslow

-- 4. Check for lock contention
SHOW ENGINE INNODB STATUS;

-- 5. Monitor key metrics
SHOW GLOBAL STATUS LIKE 'Threads_running';
SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_waits';

Scenario 2: Replication Failure

Q: Slave replication stopped, how to fix?

-- 1. Check slave status
SHOW SLAVE STATUS\G

-- 2. Common issues and fixes:
-- Duplicate key error:
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;

-- Network connectivity:
-- Check network and firewall

-- Binary log corruption:
-- Reset slave and resync from master

-- 3. Monitor replication health
-- Check Seconds_Behind_Master regularly

Scenario 3: High CPU Usage

Q: MySQL using 100% CPU, how to investigate?

-- 1. Identify expensive queries
SELECT 
    DIGEST_TEXT,
    COUNT_STAR,
    AVG_TIMER_WAIT/1000000000 as avg_seconds,
    SUM_TIMER_WAIT/1000000000 as total_seconds
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;

-- 2. Check for lock contention
SHOW ENGINE INNODB STATUS;

-- 3. Look for inefficient joins
-- Use EXPLAIN on suspicious queries

-- 4. Check connection usage
SHOW GLOBAL STATUS LIKE 'Threads_connected';
SHOW GLOBAL STATUS LIKE 'Threads_running';

💡 Best Practices Interview Questions

Q: Database design best practices? 1. Normalization: Eliminate redundancy (3NF minimum) 2. Appropriate data types: Use smallest suitable type 3. Indexing strategy: Index foreign keys and frequently queried columns 4. Constraints: Use NOT NULL, UNIQUE, CHECK constraints 5. Naming conventions: Consistent, descriptive names

Q: Backup and recovery strategy?

# Full backup
mysqldump --single-transaction --routines --triggers \
          --all-databases > full_backup.sql

# Point-in-time recovery
mysqlbinlog mysql-bin.000001 mysql-bin.000002 | mysql

# Physical backup with Percona XtraBackup
xtrabackup --backup --target-dir=/backup/
xtrabackup --prepare --target-dir=/backup/

Q: Security best practices? 1. Principle of least privilege: Grant minimal necessary permissions 2. Strong passwords: Enforce password complexity 3. SSL/TLS: Encrypt connections 4. Regular updates: Keep MySQL version current 5. Firewall: Restrict network access 6. Audit logging: Monitor database access

🎯 Coding Challenges

Challenge 1: Find Duplicate Records

-- Find users with duplicate emails
SELECT email, COUNT(*) as count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Delete duplicates keeping the latest
DELETE u1 FROM users u1
INNER JOIN users u2
WHERE u1.id < u2.id AND u1.email = u2.email;

Challenge 2: Running Totals

-- Calculate running total of sales
SELECT 
    order_date,
    daily_sales,
    SUM(daily_sales) OVER (ORDER BY order_date) as running_total
FROM (
    SELECT 
        DATE(order_date) as order_date,
        SUM(amount) as daily_sales
    FROM orders
    GROUP BY DATE(order_date)
) daily_totals;

Challenge 3: Top N per Group

-- Top 3 highest paid employees per department
SELECT 
    department,
    employee_name,
    salary
FROM (
    SELECT 
        department,
        employee_name,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
    FROM employees
) ranked
WHERE rn <= 3;

Những câu hỏi này cover từ basic đến advanced, chuẩn bị tốt cho mọi level phỏng vấn.