Amazon CloudWatch - Comprehensive Guide

📊 Tổng quan về CloudWatch

CloudWatch là gì?

  • Monitoring service: Thu thập và track metrics, logs, events
  • Real-time monitoring: Near real-time data processing
  • Alerting: Automated responses to metric thresholds
  • Dashboards: Visual monitoring interfaces
  • Integration: Deep integration với AWS services

📈 CloudWatch Metrics

Default Metrics

EC2 Metrics:
  - CPUUtilization
  - NetworkIn/NetworkOut
  - DiskReadOps/DiskWriteOps
  - StatusCheckFailed

RDS Metrics:
  - DatabaseConnections
  - CPUUtilization
  - FreeableMemory
  - ReadLatency/WriteLatency

S3 Metrics:
  - BucketSizeBytes
  - NumberOfObjects
  - AllRequests

Custom Metrics

import boto3

cloudwatch = boto3.client('cloudwatch')

# Put custom metric
cloudwatch.put_metric_data(
    Namespace='MyApp/Performance',
    MetricData=[
        {
            'MetricName': 'ResponseTime',
            'Value': 0.5,
            'Unit': 'Seconds',
            'Dimensions': [
                {
                    'Name': 'Environment',
                    'Value': 'Production'
                }
            ]
        }
    ]
)

🚨 CloudWatch Alarms

Alarm Types

Metric Alarms:
  Purpose: Monitor single metric
  Actions: SNS, Auto Scaling, EC2 actions
  States: OK, ALARM, INSUFFICIENT_DATA

Composite Alarms:
  Purpose: Combine multiple alarms
  Logic: AND, OR conditions
  Use case: Complex monitoring scenarios

Alarm Configuration

{
  "AlarmName": "High-CPU-Usage",
  "MetricName": "CPUUtilization",
  "Namespace": "AWS/EC2",
  "Statistic": "Average",
  "Period": 300,
  "EvaluationPeriods": 2,
  "Threshold": 80,
  "ComparisonOperator": "GreaterThanThreshold",
  "AlarmActions": [
    "arn:aws:sns:region:account:topic-name"
  ]
}

📝 CloudWatch Logs

Log Groups & Streams

Log Groups:
  Purpose: Container cho log streams
  Retention: 1 day to 10 years, never expire
  Encryption: KMS encryption support

Log Streams:
  Purpose: Sequence of log events
  Source: Single source (instance, function)
  Ordering: Chronological order

Log Insights

-- Query examples
fields @timestamp, @message
| filter @message like /ERROR/
| stats count() by bin(5m)

fields @timestamp, level, message
| filter level = "ERROR"
| sort @timestamp desc
| limit 100

📊 CloudWatch Dashboards

Dashboard Components

Widgets:
  - Line charts: Time-series data
  - Number widgets: Single metric value  
  - Text widgets: Markdown documentation
  - Log widgets: Log insights queries

Features:
  - Custom time ranges
  - Auto-refresh
  - Sharing capabilities
  - Multiple regions support

CloudWatch Events/EventBridge

Event Sources

AWS Services:
  - EC2 state changes
  - Auto Scaling group events
  - CodePipeline state changes
  - Scheduled events (cron)

Custom Applications:
  - Custom event patterns
  - Third-party integrations
  - API Gateway integration

Event Rules

{
  "Rules": [
    {
      "Name": "EC2InstanceStateChange",
      "EventPattern": {
        "source": ["aws.ec2"],
        "detail-type": ["EC2 Instance State-change Notification"],
        "detail": {
          "state": ["running", "stopped"]
        }
      },
      "Targets": [
        {
          "Id": "1",
          "Arn": "arn:aws:lambda:region:account:function:ProcessEC2Event"
        }
      ]
    }
  ]
}

🔍 CloudWatch Container Insights

ECS Monitoring

Container Insights:
  Metrics collected:
    - CPU và memory utilization
    - Network và disk I/O
    - Task và service metrics

  Log collection:
    - Container logs
    - Application logs
    - Performance metrics

EKS Monitoring

Kubernetes Metrics:
  - Cluster utilization
  - Node performance
  - Pod metrics
  - Namespace resource usage

Log Sources:
  - Application logs
  - Kubernetes API server logs
  - Controller manager logs

💰 CloudWatch Pricing

Cost Components

Metrics:
  Standard: First 10 metrics free, then $0.30/metric/month
  Custom: $0.30/metric/month
  High-resolution: $0.30/metric/month

Logs:
  Ingestion: $0.50/GB
  Storage: $0.03/GB/month
  Insights queries: $0.005/GB scanned

Alarms:
  Standard: $0.10/alarm/month
  High-resolution: $0.30/alarm/month

🛠️ Monitoring Best Practices

Metric Strategy

Golden Signals:
  Latency: Response time metrics
  Traffic: Request rate metrics  
  Errors: Error rate metrics
  Saturation: Resource utilization

Business Metrics:
  - Revenue-impacting KPIs
  - User experience metrics
  - Application-specific metrics

Alert Design

Effective Alerting:
  - Clear severity levels
  - Actionable notifications
  - Avoid alert fatigue
  - Include context information

Escalation:
  - Primary on-call
  - Backup contacts
  - Manager escalation
  - Automated remediation

🚀 Integration Patterns

Lambda Integration

import json
import boto3

def lambda_handler(event, context):
    # Process CloudWatch alarm
    message = json.loads(event['Records'][0]['Sns']['Message'])
    alarm_name = message['AlarmName']

    if message['NewStateValue'] == 'ALARM':
        # Trigger remediation actions
        auto_scaling = boto3.client('autoscaling')
        auto_scaling.set_desired_capacity(
            AutoScalingGroupName='my-asg',
            DesiredCapacity=5
        )

Auto Scaling Integration

CloudWatch → Auto Scaling:
  Trigger: CPU utilization > 70%
  Action: Scale out (add instances)

  Trigger: CPU utilization < 30% 
  Action: Scale in (remove instances)

  Cooldown: Prevent rapid scaling

📝 Exam Tips cho AWS SAA

Key Concepts

  • Default vs custom metrics: What's included vs extra cost
  • Alarm states: OK, ALARM, INSUFFICIENT_DATA
  • Log retention: Cost implications of long retention
  • High-resolution metrics: 1-second intervals for premium

Common Scenarios

  • Auto Scaling: CloudWatch metrics trigger scaling
  • Cost optimization: Right-size based on metrics
  • Troubleshooting: Use logs and metrics together
  • Compliance: Log retention for audit requirements

📖 Tóm tắt

CloudWatch cung cấp comprehensive monitoring solution cho AWS: - Metrics collection từ AWS services và custom applications - Real-time alerting với automated response capabilities - Log aggregation và analysis với Insights - Visualization với customizable dashboards - Integration với other AWS services for automation

Hiểu rõ CloudWatch capabilities và pricing model là essential cho designing cost-effective monitoring solutions trong AWS SAA exam.