MySQL - Câu Hỏi Phỏng Vấn
📝 Basic Level (Junior Developer)
1. SQL Fundamentals
Q: Sự khác biệt giữa INNER JOIN và LEFT JOIN là gì?
-- INNER JOIN: Chỉ trả về rows có match ở cả 2 tables
SELECT u.name, o.order_id
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
-- LEFT JOIN: Trả về tất cả rows từ left table, NULL nếu không match
SELECT u.name, o.order_id
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
Q: Primary Key và Foreign Key khác nhau như thế nào? - Primary Key: Unique identifier cho mỗi row, không null, chỉ có 1 per table - Foreign Key: Reference tới primary key của table khác, maintain referential integrity
Q: Giải thích các data types trong MySQL?
-- Numeric
INT(11) -- 4 bytes integer
DECIMAL(10,2) -- Exact decimal with precision
FLOAT -- Approximate decimal
-- String
VARCHAR(255) -- Variable length string
CHAR(10) -- Fixed length string
TEXT -- Large text data
-- Date/Time
DATE -- YYYY-MM-DD
DATETIME -- YYYY-MM-DD HH:MM:SS
TIMESTAMP -- Auto-updating timestamp
2. Indexing Basics
Q: Index là gì và tại sao cần thiết? - Definition: Cấu trúc dữ liệu giúp tăng tốc độ truy vấn - Benefits: Faster SELECT, WHERE, ORDER BY - Drawbacks: Slower INSERT/UPDATE/DELETE, extra storage
Q: Tạo index như thế nào?
-- Single column index
CREATE INDEX idx_email ON users(email);
-- Composite index
CREATE INDEX idx_name_age ON users(last_name, first_name, age);
-- Unique index
CREATE UNIQUE INDEX idx_username ON users(username);
3. Basic Queries
Q: Viết query tìm top 5 customers có nhiều orders nhất?
SELECT
c.customer_name,
COUNT(o.order_id) as total_orders
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
ORDER BY total_orders DESC
LIMIT 5;
🔧 Intermediate Level (Mid-level Developer)
1. Performance Optimization
Q: Làm thế nào để optimize một slow query?
-- 1. Analyze query execution
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';
-- 2. Add appropriate indexes
CREATE INDEX idx_email ON users(email);
-- 3. Rewrite query if needed
-- Before: SELECT * FROM users WHERE YEAR(created_at) = 2023
-- After: SELECT * FROM users WHERE created_at >= '2023-01-01' AND created_at < '2024-01-01'
-- 4. Use LIMIT when appropriate
SELECT * FROM large_table WHERE condition LIMIT 100;
Q: Giải thích query execution plan?
EXPLAIN FORMAT=JSON SELECT * FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active';
-- Key fields to check:
-- - type: ALL (bad), range (good), const (best)
-- - possible_keys: Available indexes
-- - key: Actually used index
-- - rows: Estimated rows examined
-- - Extra: Additional information
2. Transactions & Isolation
Q: ACID properties là gì? - Atomicity: All or nothing - transaction hoàn toàn thành công hoặc rollback - Consistency: Database luôn ở valid state - Isolation: Concurrent transactions không ảnh hưởng lẫn nhau - Durability: Committed changes persist sau system failure
Q: Isolation levels trong MySQL?
-- READ UNCOMMITTED - Dirty reads possible
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
-- READ COMMITTED - No dirty reads
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- REPEATABLE READ (Default) - No dirty reads, no non-repeatable reads
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- SERIALIZABLE - Highest isolation
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
3. Storage Engines
Q: Sự khác biệt giữa InnoDB và MyISAM?
| Feature | InnoDB | MyISAM |
|---|---|---|
| Transactions | Yes | No |
| Foreign Keys | Yes | No |
| Locking | Row-level | Table-level |
| Crash Recovery | Yes | No |
| ACID | Yes | No |
| Performance | Better for writes | Better for reads |
Q: Khi nào sử dụng Memory storage engine?
CREATE TABLE session_data (
session_id VARCHAR(128) PRIMARY KEY,
data TEXT,
last_access TIMESTAMP
) ENGINE=MEMORY;
-- Use cases:
-- - Temporary tables
-- - Cache data
-- - Session storage
-- - Lookup tables
🚀 Advanced Level (Senior Developer/DBA)
1. High Availability & Replication
Q: Giải thích MySQL replication workflow?
-- Master logs changes to binary log
-- Slave connects and requests binary log events
-- Slave applies changes to relay log
-- Slave executes relay log events
-- Setup replication:
-- On Master:
CREATE USER 'repl'@'slave_ip' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'slave_ip';
-- On Slave:
CHANGE MASTER TO
MASTER_HOST='master_ip',
MASTER_USER='repl',
MASTER_PASSWORD='password',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=4;
START SLAVE;
Q: Làm thế nào handle replication lag?
-- Monitor replication lag
SHOW SLAVE STATUS\G
-- Check Seconds_Behind_Master
-- Solutions:
-- 1. Parallel replication
SET GLOBAL slave_parallel_workers = 4;
SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK';
-- 2. Read from master for critical data
-- 3. Use semi-synchronous replication
-- 4. Optimize slow queries on slave
2. Performance Tuning
Q: Key MySQL configuration parameters cần tune?
[mysqld]
# Memory settings
innodb_buffer_pool_size = 1G # 70-80% of RAM
innodb_log_file_size = 256M # 25% of buffer pool
innodb_log_buffer_size = 16M
# Connection settings
max_connections = 200 # Based on workload
thread_cache_size = 8 # Reduce thread creation
# Query cache (deprecated in 8.0)
query_cache_type = ON
query_cache_size = 32M
# InnoDB settings
innodb_flush_log_at_trx_commit = 1 # ACID compliance
innodb_file_per_table = ON # Better space management
Q: Cách monitor MySQL performance?
-- Key metrics to monitor
SHOW GLOBAL STATUS LIKE 'Questions'; -- QPS
SHOW GLOBAL STATUS LIKE 'Threads_connected'; -- Active connections
SHOW GLOBAL STATUS LIKE 'Slow_queries'; -- Slow query count
-- Buffer pool efficiency
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads';
-- Hit ratio = (requests - reads) / requests * 100
-- Performance Schema queries
SELECT * FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC LIMIT 10;
3. Scaling Strategies
Q: Database sharding strategies?
# Range-based sharding
def get_shard_by_range(user_id):
if user_id <= 1000000:
return 'shard1'
elif user_id <= 2000000:
return 'shard2'
else:
return 'shard3'
# Hash-based sharding
def get_shard_by_hash(user_id):
return f'shard{user_id % 4}'
# Directory-based sharding
# Lookup table maps entity_id -> shard_name
Q: Vertical vs Horizontal partitioning? - Vertical: Split table by columns (normalization) - Horizontal: Split table by rows (sharding/partitioning)
-- Horizontal partitioning by range
CREATE TABLE orders (
id INT,
order_date DATE,
customer_id INT
) PARTITION BY RANGE (YEAR(order_date)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025)
);
🔍 Troubleshooting Scenarios
Scenario 1: Slow Query Performance
Q: Application suddenly slow, how to diagnose?
-- 1. Check current running queries
SHOW PROCESSLIST;
-- 2. Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
-- 3. Analyze slow queries
-- Use pt-query-digest or mysqldumpslow
-- 4. Check for lock contention
SHOW ENGINE INNODB STATUS;
-- 5. Monitor key metrics
SHOW GLOBAL STATUS LIKE 'Threads_running';
SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_waits';
Scenario 2: Replication Failure
Q: Slave replication stopped, how to fix?
-- 1. Check slave status
SHOW SLAVE STATUS\G
-- 2. Common issues and fixes:
-- Duplicate key error:
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;
-- Network connectivity:
-- Check network and firewall
-- Binary log corruption:
-- Reset slave and resync from master
-- 3. Monitor replication health
-- Check Seconds_Behind_Master regularly
Scenario 3: High CPU Usage
Q: MySQL using 100% CPU, how to investigate?
-- 1. Identify expensive queries
SELECT
DIGEST_TEXT,
COUNT_STAR,
AVG_TIMER_WAIT/1000000000 as avg_seconds,
SUM_TIMER_WAIT/1000000000 as total_seconds
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;
-- 2. Check for lock contention
SHOW ENGINE INNODB STATUS;
-- 3. Look for inefficient joins
-- Use EXPLAIN on suspicious queries
-- 4. Check connection usage
SHOW GLOBAL STATUS LIKE 'Threads_connected';
SHOW GLOBAL STATUS LIKE 'Threads_running';
💡 Best Practices Interview Questions
Q: Database design best practices? 1. Normalization: Eliminate redundancy (3NF minimum) 2. Appropriate data types: Use smallest suitable type 3. Indexing strategy: Index foreign keys and frequently queried columns 4. Constraints: Use NOT NULL, UNIQUE, CHECK constraints 5. Naming conventions: Consistent, descriptive names
Q: Backup and recovery strategy?
# Full backup
mysqldump --single-transaction --routines --triggers \
--all-databases > full_backup.sql
# Point-in-time recovery
mysqlbinlog mysql-bin.000001 mysql-bin.000002 | mysql
# Physical backup with Percona XtraBackup
xtrabackup --backup --target-dir=/backup/
xtrabackup --prepare --target-dir=/backup/
Q: Security best practices? 1. Principle of least privilege: Grant minimal necessary permissions 2. Strong passwords: Enforce password complexity 3. SSL/TLS: Encrypt connections 4. Regular updates: Keep MySQL version current 5. Firewall: Restrict network access 6. Audit logging: Monitor database access
🎯 Coding Challenges
Challenge 1: Find Duplicate Records
-- Find users with duplicate emails
SELECT email, COUNT(*) as count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Delete duplicates keeping the latest
DELETE u1 FROM users u1
INNER JOIN users u2
WHERE u1.id < u2.id AND u1.email = u2.email;
Challenge 2: Running Totals
-- Calculate running total of sales
SELECT
order_date,
daily_sales,
SUM(daily_sales) OVER (ORDER BY order_date) as running_total
FROM (
SELECT
DATE(order_date) as order_date,
SUM(amount) as daily_sales
FROM orders
GROUP BY DATE(order_date)
) daily_totals;
Challenge 3: Top N per Group
-- Top 3 highest paid employees per department
SELECT
department,
employee_name,
salary
FROM (
SELECT
department,
employee_name,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
FROM employees
) ranked
WHERE rn <= 3;
Những câu hỏi này cover từ basic đến advanced, chuẩn bị tốt cho mọi level phỏng vấn.