11. Hash Tables (Bảng băm)

🎯 Mục tiêu học tập

Hiểu cách hoạt động của hash function và collision handling
Cài đặt hash table từ đầu
Phân tích load factor và performance
Áp dụng hash table vào bài toán thực tế

📚 Lý thuyết

Hash Table là gì?

Định nghĩa: Cấu trúc dữ liệu sử dụng hash function để map keys thành indexes của array.

Thành phần chính:

Hash Function: h(key) → index
Buckets/Slots: Array để lưu data
Collision Resolution: Xử lý khi nhiều keys có cùng hash

🔧 Hash Functions

1. Division Method

int hash(int key, int tableSize) {
    return key % tableSize;  // tableSize thường là số nguyên tố
}

2. Multiplication Method

int hash(int key, int tableSize) {
    double A = 0.6180339887;  // (√5 - 1) / 2
    return (int) (tableSize * ((key * A) % 1));
}

3. String Hashing

int hash(String key, int tableSize) {
    int hash = 0;
    for (int i = 0; i < key.length(); i++) {
        hash = (hash * 31 + key.charAt(i)) % tableSize;
    }
    return hash;
}

💥 Collision Resolution

1. Chaining (Separate Chaining)

// Mỗi bucket là một linked list
class HashTable {
    private LinkedList<Entry>[] table;

    // Collision → thêm vào linked list
}

2. Open Addressing

Linear Probing

// h(key) + i (i = 0, 1, 2, ...)
int probe(int hash, int i, int tableSize) {
    return (hash + i) % tableSize;
}

Quadratic Probing

// h(key) + i² (i = 0, 1, 2, ...)
int probe(int hash, int i, int tableSize) {
    return (hash + i * i) % tableSize;
}

Double Hashing

// h1(key) + i * h2(key)
int probe(int key, int i, int tableSize) {
    int h1 = hash1(key);
    int h2 = hash2(key);
    return (h1 + i * h2) % tableSize;
}

📊 Performance Analysis

Time Complexity:

Average Case: O(1) cho search, insert, delete
Worst Case: O(n) khi tất cả keys collision

Load Factor (α):

α = n / m  // n = số elements, m = table size

Chaining: α có thể > 1
Open Addressing: α phải < 1

Khi nào resize?

Chaining: α > 0.75
Open Addressing: α > 0.5

🔄 Dynamic Resizing

void resize() {
    Entry[] oldTable = table;
    table = new Entry[oldTable.length * 2];
    size = 0;

    // Rehash tất cả elements
    for (Entry entry : oldTable) {
        if (entry != null) {
            put(entry.key, entry.value);
        }
    }
}

🎯 Applications

1. HashSet & HashMap (Java)

HashSet<String> set = new HashSet<>();  // Unique elements
HashMap<String, Integer> map = new HashMap<>();  // Key-value pairs

2. Database Indexing

B+ trees vs Hash indexes
Fast equality lookups

3. Caching

// LRU Cache implementation
class LRUCache {
    HashMap<Integer, Node> map;
    DoublyLinkedList dll;
}

4. Symbol Tables

Compiler/interpreter symbol tables
Variable name → memory address

📋 Bài tập thực hành

Cơ bản:

Implement hash table với chaining
Implement hash table với linear probing
Design hash function cho strings
Count frequency of elements

Trung bình:

Two Sum problem
Group Anagrams
First Non-repeating Character
Longest Substring Without Repeating Characters

Nâng cao:

LRU Cache implementation
Design Twitter
Implement magic dictionary
Random data structure với O(1) operations

💡 Design Considerations

1. Hash Function Quality

Uniform distribution
Fast computation
Deterministic

2. Collision Strategy

Few collisions: Open addressing
Many collisions: Chaining

3. Memory vs Performance

Memory limited: Open addressing
Performance critical: Chaining với good hash function

⚠️ Common Issues

1. Poor Hash Function

// Bad: return key % 2;  // Chỉ dùng 2 buckets
// Good: return key % primeNumber;

2. High Load Factor

// Monitor và resize khi cần thiết
if (size >= table.length * loadFactorThreshold) {
    resize();
}

3. Hash Code Consistency

// Trong Java: nếu override equals() thì phải override hashCode()

🚀 Chuẩn bị code

mkdir 11-hash-tables
# Files: 
# - HashTableChaining.java
# - HashTableOpenAddressing.java  
# - HashFunctions.java
# - HashTableApplications.java