Lab 02: RDS Multi-AZ High Availability Setup

🎯 Lab Objectives

Học cách thiết lập và test RDS Multi-AZ deployment để đảm bảo high availability và automatic failover.

Skills You'll Learn

  • Tạo RDS instance với Multi-AZ configuration
  • Configure security groups và subnet groups
  • Test automatic failover scenarios
  • Monitor RDS performance và health
  • Implement backup và restore strategies

📋 Prerequisites

  • AWS Account với appropriate permissions
  • Basic understanding của VPC và subnets
  • Knowledge về database concepts

🏗️ Architecture Overview

Primary AZ (us-east-1a)        Standby AZ (us-east-1b)
┌─────────────────────┐      ┌─────────────────────┐
│   EC2 Instance      │      │                     │
│   (Web Server)      │      │                     │
│         │           │      │                     │
│         ▼           │      │                     │
│   RDS Primary       │◄────►│   RDS Standby       │
│   (MySQL)           │      │   (Sync Replica)    │
│         │           │      │                     │
│         ▼           │      │                     │
│   EBS Volume        │      │   EBS Volume        │
└─────────────────────┘      └─────────────────────┘

🔧 Step 1: Create VPC và Subnet Groups

1.1 Create VPC

# Create VPC
aws ec2 create-vpc \
    --cidr-block 10.0.0.0/16 \
    --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=RDS-Lab-VPC}]'

# Enable DNS hostnames
aws ec2 modify-vpc-attribute \
    --vpc-id vpc-xxxxxxxx \
    --enable-dns-hostnames

1.2 Create Subnets

# Public subnet (for EC2)
aws ec2 create-subnet \
    --vpc-id vpc-xxxxxxxx \
    --cidr-block 10.0.1.0/24 \
    --availability-zone us-east-1a \
    --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=Public-Subnet-1a}]'

# Private subnet 1 (for RDS)
aws ec2 create-subnet \
    --vpc-id vpc-xxxxxxxx \
    --cidr-block 10.0.101.0/24 \
    --availability-zone us-east-1a \
    --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=Private-Subnet-1a}]'

# Private subnet 2 (for RDS Multi-AZ)
aws ec2 create-subnet \
    --vpc-id vpc-xxxxxxxx \
    --cidr-block 10.0.102.0/24 \
    --availability-zone us-east-1b \
    --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=Private-Subnet-1b}]'

1.3 Create DB Subnet Group

aws rds create-db-subnet-group \
    --db-subnet-group-name rds-lab-subnet-group \
    --db-subnet-group-description "Subnet group for RDS lab" \
    --subnet-ids subnet-xxxxxxxx subnet-yyyyyyyy

🔒 Step 2: Configure Security Groups

2.1 RDS Security Group

# Create security group for RDS
aws ec2 create-security-group \
    --group-name rds-security-group \
    --description "Security group for RDS instances" \
    --vpc-id vpc-xxxxxxxx

# Allow MySQL access from EC2 security group
aws ec2 authorize-security-group-ingress \
    --group-id sg-xxxxxxxx \
    --protocol tcp \
    --port 3306 \
    --source-group sg-yyyyyyyy

2.2 EC2 Security Group

# Create security group for EC2
aws ec2 create-security-group \
    --group-name ec2-security-group \
    --description "Security group for EC2 instances" \
    --vpc-id vpc-xxxxxxxx

# Allow SSH access
aws ec2 authorize-security-group-ingress \
    --group-id sg-yyyyyyyy \
    --protocol tcp \
    --port 22 \
    --cidr 0.0.0.0/0

# Allow HTTP access
aws ec2 authorize-security-group-ingress \
    --group-id sg-yyyyyyyy \
    --protocol tcp \
    --port 80 \
    --cidr 0.0.0.0/0

🗄️ Step 3: Create RDS Multi-AZ Instance

3.1 Create Parameter Group

aws rds create-db-parameter-group \
    --db-parameter-group-name mysql-lab-params \
    --db-parameter-group-family mysql8.0 \
    --description "Custom parameter group for lab"

# Modify parameters for monitoring
aws rds modify-db-parameter-group \
    --db-parameter-group-name mysql-lab-params \
    --parameters "ParameterName=slow_query_log,ParameterValue=1,ApplyMethod=immediate" \
                 "ParameterName=long_query_time,ParameterValue=1,ApplyMethod=immediate"

3.2 Create RDS Instance

aws rds create-db-instance \
    --db-instance-identifier rds-lab-mysql \
    --db-instance-class db.t3.micro \
    --engine mysql \
    --engine-version 8.0.35 \
    --master-username admin \
    --master-user-password MySecurePassword123! \
    --allocated-storage 20 \
    --storage-type gp2 \
    --multi-az \
    --db-subnet-group-name rds-lab-subnet-group \
    --vpc-security-group-ids sg-xxxxxxxx \
    --db-parameter-group-name mysql-lab-params \
    --backup-retention-period 7 \
    --preferred-backup-window "03:00-04:00" \
    --preferred-maintenance-window "sun:04:00-sun:05:00" \
    --storage-encrypted \
    --enable-performance-insights \
    --monitoring-interval 60 \
    --monitoring-role-arn arn:aws:iam::account:role/rds-monitoring-role \
    --tags Key=Name,Value=RDS-Lab-MySQL

💻 Step 4: Launch EC2 Instance

4.1 Launch EC2

aws ec2 run-instances \
    --image-id ami-0c02fb55956c7d316 \
    --instance-type t2.micro \
    --key-name your-key-pair \
    --security-group-ids sg-yyyyyyyy \
    --subnet-id subnet-xxxxxxxx \
    --associate-public-ip-address \
    --user-data file://userdata.sh \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=RDS-Lab-EC2}]'

4.2 UserData Script (userdata.sh)

#!/bin/bash
yum update -y
yum install -y mysql
amazon-linux-extras install -y lamp-mariadb10.2-php7.2

# Install CloudWatch agent
wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
rpm -U ./amazon-cloudwatch-agent.rpm

# Create test script
cat > /home/ec2-user/test-db.sh << 'EOF'
#!/bin/bash
DB_HOST="rds-lab-mysql.xxxxxxxx.us-east-1.rds.amazonaws.com"
DB_USER="admin"
DB_PASS="MySecurePassword123!"

mysql -h $DB_HOST -u $DB_USER -p$DB_PASS -e "SHOW STATUS LIKE 'Uptime';"
EOF

chmod +x /home/ec2-user/test-db.sh

🧪 Step 5: Test Database Connectivity

5.1 Connect to Database

# SSH to EC2 instance
ssh -i your-key.pem ec2-user@ec2-public-ip

# Test connection
mysql -h rds-lab-mysql.xxxxxxxx.us-east-1.rds.amazonaws.com -u admin -p

# Create test database và table
CREATE DATABASE testdb;
USE testdb;
CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO users (name, email) VALUES 
('John Doe', 'john@example.com'),
('Jane Smith', 'jane@example.com');

SELECT * FROM users;

5.2 Monitor RDS Status

# Check RDS instance status
aws rds describe-db-instances --db-instance-identifier rds-lab-mysql

# Check Multi-AZ status
aws rds describe-db-instances \
    --db-instance-identifier rds-lab-mysql \
    --query 'DBInstances[0].[MultiAZ,AvailabilityZone,SecondaryAvailabilityZone]'

🚨 Step 6: Test Failover Scenarios

6.1 Initiate Manual Failover

# Force failover to test Multi-AZ
aws rds reboot-db-instance \
    --db-instance-identifier rds-lab-mysql \
    --force-failover

# Monitor failover progress
aws rds describe-events \
    --source-identifier rds-lab-mysql \
    --source-type db-instance \
    --start-time $(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%S)

6.2 Test Application During Failover

# Run continuous connection test
cat > /home/ec2-user/failover-test.sh << 'EOF'
#!/bin/bash
DB_HOST="rds-lab-mysql.xxxxxxxx.us-east-1.rds.amazonaws.com"
DB_USER="admin"
DB_PASS="MySecurePassword123!"

echo "Starting failover test..."
while true; do
    timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    result=$(mysql -h $DB_HOST -u $DB_USER -p$DB_PASS -e "SELECT 1" 2>&1)

    if [[ $? -eq 0 ]]; then
        echo "[$timestamp] SUCCESS: Database connection OK"
    else
        echo "[$timestamp] FAILED: $result"
    fi

    sleep 5
done
EOF

chmod +x /home/ec2-user/failover-test.sh
./failover-test.sh

📊 Step 7: Monitor Performance

7.1 CloudWatch Metrics

# Get CPU utilization
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name CPUUtilization \
    --dimensions Name=DBInstanceIdentifier,Value=rds-lab-mysql \
    --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 300 \
    --statistics Average

# Get database connections
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name DatabaseConnections \
    --dimensions Name=DBInstanceIdentifier,Value=rds-lab-mysql \
    --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 300 \
    --statistics Average

7.2 Performance Insights

-- Check top SQL statements
SELECT * FROM performance_schema.events_statements_summary_by_digest 
ORDER BY AVG_TIMER_WAIT DESC LIMIT 10;

-- Monitor connection status
SHOW PROCESSLIST;

-- Check InnoDB status
SHOW ENGINE INNODB STATUS;

🔄 Step 8: Backup và Restore Testing

8.1 Create Manual Snapshot

aws rds create-db-snapshot \
    --db-instance-identifier rds-lab-mysql \
    --db-snapshot-identifier rds-lab-snapshot-$(date +%Y%m%d%H%M%S)

8.2 Test Point-in-Time Recovery

# Restore to specific time
aws rds restore-db-instance-to-point-in-time \
    --source-db-instance-identifier rds-lab-mysql \
    --target-db-instance-identifier rds-lab-restored \
    --restore-time $(date -u -d '30 minutes ago' +%Y-%m-%dT%H:%M:%S) \
    --db-subnet-group-name rds-lab-subnet-group \
    --vpc-security-group-ids sg-xxxxxxxx

📝 Step 9: Cleanup Resources

9.1 Delete RDS Resources

# Delete restored instance
aws rds delete-db-instance \
    --db-instance-identifier rds-lab-restored \
    --skip-final-snapshot

# Delete main instance
aws rds delete-db-instance \
    --db-instance-identifier rds-lab-mysql \
    --final-db-snapshot-identifier rds-lab-final-snapshot

# Delete subnet group
aws rds delete-db-subnet-group \
    --db-subnet-group-name rds-lab-subnet-group

9.2 Delete VPC Resources

# Terminate EC2 instance
aws ec2 terminate-instances --instance-ids i-xxxxxxxx

# Delete security groups
aws ec2 delete-security-group --group-id sg-xxxxxxxx
aws ec2 delete-security-group --group-id sg-yyyyyyyy

# Delete subnets
aws ec2 delete-subnet --subnet-id subnet-xxxxxxxx
aws ec2 delete-subnet --subnet-id subnet-yyyyyyyy

# Delete VPC
aws ec2 delete-vpc --vpc-id vpc-xxxxxxxx

📚 Key Learnings

Multi-AZ Benefits

  • Automatic failover: No manual intervention required
  • Synchronous replication: Zero data loss
  • Single endpoint: Application transparency
  • Backup from standby: No performance impact

Failover Characteristics

  • Failover time: Typically 1-2 minutes
  • DNS update: Endpoint remains same
  • Connection handling: Applications need retry logic
  • Performance: Brief interruption during failover

Best Practices

  • Enable Multi-AZ for production workloads
  • Use connection pooling trong applications
  • Implement proper retry logic
  • Monitor failover events
  • Test failover procedures regularly

🎓 Assessment Questions

  1. What happens to the database endpoint during failover?
  2. How long does a typical Multi-AZ failover take?
  3. What's the difference between Multi-AZ và Read Replicas?
  4. When would you choose Single-AZ vs Multi-AZ?

Lab này cung cấp hands-on experience với RDS Multi-AZ deployment và failover testing, essential skills cho AWS Solutions Architect Associate certification.