MongoDB - Ưu Nhược Điểm

✅ Ưu Điểm

1. 🚀 Schema Flexibility

  • Dynamic Schema: Không cần define schema trước
  • Easy Evolution: Thêm fields mới dễ dàng
  • Mixed Data Types: Lưu trữ đa dạng data types trong cùng collection
  • Rapid Prototyping: Development nhanh hơn
// Có thể insert documents với structure khác nhau
db.users.insertMany([
  {
    "name": "John",
    "email": "john@example.com"
  },
  {
    "name": "Alice", 
    "email": "alice@example.com",
    "age": 25,
    "preferences": {
      "newsletter": true,
      "theme": "dark"
    }
  }
])

2. 📄 Document-Oriented Storage

  • Natural Object Mapping: Mapping trực tiếp với application objects
  • Embedded Documents: Giảm joins, tăng performance
  • Rich Data Types: Arrays, nested objects, dates
  • JSON-like Structure: Dễ hiểu và làm việc
// Complex nested structure
{
  "_id": ObjectId("..."),
  "name": "E-commerce Order",
  "customer": {
    "name": "John Doe",
    "email": "john@example.com",
    "address": {
      "street": "123 Main St",
      "city": "New York",
      "zipCode": "10001"
    }
  },
  "items": [
    {
      "productId": "prod1",
      "name": "Laptop",
      "price": 999.99,
      "quantity": 1
    },
    {
      "productId": "prod2", 
      "name": "Mouse",
      "price": 29.99,
      "quantity": 2
    }
  ],
  "totalAmount": 1059.97,
  "orderDate": ISODate("2024-01-15T10:30:00Z")
}

3. 🔍 Powerful Query Language

  • Rich Query Operators: $and, $or, $in, $regex, etc.
  • Aggregation Framework: Complex analytics và data processing
  • Indexing Support: Compound, partial, text, geospatial indexes
  • Full-text Search: Built-in text search capabilities
// Complex aggregation pipeline
db.sales.aggregate([
  {
    $match: {
      "date": { $gte: ISODate("2024-01-01") },
      "status": "completed"
    }
  },
  {
    $group: {
      "_id": {
        "month": { $month: "$date" },
        "category": "$productCategory"
      },
      "totalSales": { $sum: "$amount" },
      "averageOrder": { $avg: "$amount" }
    }
  },
  {
    $sort: { "totalSales": -1 }
  }
])

4. 📈 Horizontal Scalability

  • Automatic Sharding: Distributes data across multiple machines
  • Linear Scaling: Add nodes để tăng capacity
  • Load Distribution: Automatic balancing across shards
  • High Availability: Replica sets với automatic failover
// Sharding setup
sh.enableSharding("ecommerce")
sh.shardCollection("ecommerce.orders", { "customerId": "hashed" })

5. ⚡ Performance Benefits

  • Fast Reads: Optimized cho read-heavy workloads
  • Memory Mapping: WiredTiger storage engine efficiency
  • Index Optimization: Automatic index selection
  • Connection Pooling: Efficient connection management

6. 🛠️ Developer Experience

  • Easy to Learn: Familiar JSON-like syntax
  • Rich Ecosystem: Extensive drivers và tools
  • Active Community: Large community support
  • Cloud Integration: Atlas, AWS DocumentDB

7. 🔄 Replication & HA

  • Replica Sets: Automatic failover và data redundancy
  • Read Scaling: Read từ secondary nodes
  • Oplog: Change streams for real-time updates
  • Geographic Distribution: Cross-datacenter replication

❌ Nhược Điểm

1. 💾 Memory Usage

  • High Memory Requirements: Working set phải fit trong RAM
  • Index Overhead: Indexes consume significant memory
  • Document Overhead: BSON format có overhead
  • Memory Leaks: Potential memory issues với large datasets
// Memory monitoring
db.serverStatus().mem
db.serverStatus().wiredTiger.cache

2. 🔗 ACID Limitations

  • Single Document ACID: Multi-document transactions có performance cost
  • Eventual Consistency: Default replication là asynchronous
  • Transaction Overhead: Multi-document transactions expensive
  • Limited Isolation: Default read concern có thể read uncommitted data
// Transaction performance impact
const session = db.getMongo().startSession()
session.startTransaction() // Performance overhead
// ... operations
session.commitTransaction() // Network round trips

3. 📊 Join Performance

  • No Native Joins: $lookup operations expensive
  • Denormalization Required: Encourage data duplication
  • Complex Relationships: Difficult to model complex relations
  • Referential Integrity: No foreign key constraints
// Expensive $lookup operation
db.orders.aggregate([
  {
    $lookup: {
      from: "customers",     // Expensive operation
      localField: "customerId",
      foreignField: "_id",
      as: "customer"
    }
  }
])

4. 🔍 Query Limitations

  • Limited SQL Features: No window functions, CTEs
  • Complex Analytics: Not suitable for complex reporting
  • Aggregation Memory Limits: 100MB limit per stage
  • Full-table Scans: Poorly designed queries can be expensive

5. 📏 Storage Overhead

  • Field Name Duplication: Field names stored in every document
  • BSON Overhead: Binary format adds size
  • Padding: Document growth can cause fragmentation
  • Index Size: Multiple indexes increase storage requirements
// Storage overhead example
{
  "very_long_field_name_that_gets_repeated": "value1",  // Field name stored
  "another_very_long_field_name": "value2",             // in every document
  "yet_another_long_field_name": "value3"               // causing overhead
}

6. 🔧 Operational Complexity

  • Sharding Complexity: Shard key selection critical
  • Balancing Issues: Chunk migration can impact performance
  • Backup Challenges: Point-in-time recovery complexity
  • Monitoring Requirements: Need specialized monitoring tools

7. 💰 Enterprise Costs

  • Atlas Pricing: Managed service can be expensive
  • Enterprise Features: Advanced features require paid license
  • Scaling Costs: Horizontal scaling increases infrastructure costs
  • Support Costs: Professional support not cheap

📊 So Sánh Với Competitors

MongoDB vs MySQL

Feature MongoDB MySQL
Schema Flexible Fixed
Scalability Horizontal Vertical (mainly)
ACID Limited Full
Joins $lookup (expensive) Native (efficient)
Learning Curve Easy Moderate
Analytics Basic Good

MongoDB vs PostgreSQL

Feature MongoDB PostgreSQL
JSON Support Native Good (JSONB)
Schema Flexibility High Medium
Complex Queries Limited Excellent
Performance Good reads Balanced
Ecosystem NoSQL focused SQL focused

MongoDB vs Cassandra

Feature MongoDB Cassandra
Data Model Document Column-family
Write Performance Good Excellent
Read Performance Excellent Good
Consistency Eventual Tunable
Query Language Rich Limited (CQL)

🎯 Khi Nào Nên Chọn MongoDB

✅ Suitable For:

  • Content Management: Blogs, CMS, catalogs
  • Real-time Analytics: Event logging, user tracking
  • IoT Applications: Sensor data, time-series data
  • Mobile Applications: Offline-first apps
  • Rapid Development: Prototyping, agile development
  • Flexible Schema: Evolving data models
  • Geographic Data: Location-based services
  • Caching Layer: Session storage, temporary data

❌ Avoid When:

  • Complex Transactions: Banking, financial systems
  • Heavy Analytics: Business intelligence, reporting
  • Fixed Schema: Well-defined, stable data structures
  • Budget Constraints: Cost-sensitive applications
  • Small Team: Limited NoSQL expertise
  • Regulatory Compliance: Strict ACID requirements
  • Complex Relationships: Heavy relational data

💡 Best Practices To Mitigate Weaknesses

1. Schema Design

// Embed related data to avoid joins
{
  "_id": ObjectId("..."),
  "order": {
    "customer": {              // Embedded customer info
      "name": "John Doe",
      "email": "john@example.com"
    },
    "items": [...],            // Embedded order items
    "shipping": {...}          // Embedded shipping info
  }
}

2. Indexing Strategy

// Compound indexes for query patterns
db.users.createIndex({ 
  "status": 1,           // Equality first
  "createdAt": -1,       // Sort second  
  "age": 1               // Range last
})

// Partial indexes to save space
db.users.createIndex(
  { "email": 1 },
  { partialFilterExpression: { "status": "active" }}
)

3. Connection Management

// Connection pooling
const MongoClient = require('mongodb').MongoClient;
const client = new MongoClient(uri, {
  maxPoolSize: 10,        // Limit connection pool
  serverSelectionTimeoutMS: 5000,
  socketTimeoutMS: 45000,
})

4. Memory Optimization

// Monitor memory usage
db.serverStatus().wiredTiger.cache

// Use projections to limit data transfer
db.users.find(
  { "status": "active" },
  { "name": 1, "email": 1, "_id": 0 }  // Only return needed fields
)

5. Transaction Best Practices

// Keep transactions short
const session = db.getMongo().startSession()
try {
  session.startTransaction()

  // Minimize operations in transaction
  db.collection1.updateOne({...}, {...}, { session })
  db.collection2.insertOne({...}, { session })

  session.commitTransaction()
} catch (error) {
  session.abortTransaction()
  throw error
} finally {
  session.endSession()
}

6. Monitoring Setup

// Enable profiling for slow operations
db.setProfilingLevel(1, { slowms: 100 })

// Monitor with mongostat
mongostat --host localhost:27017

// Use MongoDB Compass for GUI monitoring

Hiểu rõ strengths và limitations giúp sử dụng MongoDB hiệu quả và trả lời tốt câu hỏi phỏng vấn.