Common Database Knowledge

This file consolidates fundamental database concepts that are universally applicable across different database systems and are essential for any data professional.


ACID Properties

ACID is an acronym that refers to the four key properties of a transaction: Atomicity, Consistency, Isolation, and Durability. These properties guarantee that database transactions are processed reliably.

  • Atomicity (Tính nguyên tử): Ensures that a transaction is treated as a single, indivisible unit. It either completes fully, or it doesn't happen at all. If any part of the transaction fails, the entire transaction is rolled back.
  • Consistency (Tính nhất quán): Guarantees that a transaction brings the database from one valid state to another. All data integrity constraints (like primary keys, foreign keys, and checks) must be satisfied.
  • Isolation (Tính cô lập): Ensures that concurrent transactions do not interfere with each other. The result of concurrent transactions is the same as if they were executed sequentially.
  • Durability (Tính bền vững): Guarantees that once a transaction has been committed, it will remain committed even in the event of a system failure (e.g., power outage or crash).

Detailed Explanation and Examples

A - Atomicity (Tính Nguyên Tử)

"All or Nothing" - Transaction phải được thực hiện hoàn toàn hoặc không thực hiện gì cả.

Ví Dụ Thực Tế

-- Transfer money between accounts
BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- Debit
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- Credit
COMMIT;

-- Nếu bất kỳ operation nào fail → rollback toàn bộ
-- Không có tình trạng money disappears hoặc duplicates

Implementation Mechanisms - Write-Ahead Logging (WAL): Log changes to disk before modifying data. Recovery process can replay logs after crash. - Shadow Paging: Original page is copied, changes are made to the copy, and then an atomic pointer switch updates the reference.

C - Consistency (Tính Nhất Quán)

Database phải luôn ở valid state trước và sau mỗi transaction. Mọi business rules và constraints phải được maintain.

Consistency Types - Entity Integrity: Primary key không được null và phải unique. - Referential Integrity: Foreign key phải reference existing record. - Domain Integrity: Data must follow business rules (e.g., price >= 0). - User-Defined Integrity: Custom business rules enforced via triggers or application logic.

Ví Dụ Vi Phạm Tính Nhất Quán

-- Business rule: Total money in system must remain constant
BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 'A';
  -- System failure before next statement
  UPDATE accounts SET balance = balance + 100 WHERE id = 'B';
-- Result: Money disappeared if transaction is not atomic and consistent.

I - Isolation (Tính Cô Lập)

Concurrent transactions không được ảnh hưởng đến nhau. Mỗi transaction như được thực hiện độc lập.

Isolation Levels - Read Uncommitted: Lowest isolation. Allows dirty reads. - Read Committed: Prevents dirty reads. Allows non-repeatable reads and phantom reads. - Repeatable Read: Prevents dirty reads and non-repeatable reads. Allows phantom reads. - Serializable: Highest isolation. Prevents dirty reads, non-repeatable reads, and phantom reads.

Concurrency Problems - Dirty Read: Reading uncommitted changes made by another transaction. - Non-Repeatable Read: Reading the same data twice and getting different values because another transaction modified it. - Phantom Read: A query returns a different set of rows when executed twice because another transaction inserted or deleted rows.

Implementation Mechanisms - Locking: Shared locks for reads, exclusive locks for writes. Two-Phase Locking (2PL) is a common protocol. - Multi-Version Concurrency Control (MVCC): Each row has multiple versions with timestamps. Readers see appropriate versions based on transaction start time, writers create new versions.

D - Durability (Tính Bền Vững)

Committed changes phải persist permanently, ngay cả khi system crash, power failure, hay hardware failure.

Implementation Mechanisms - Write-Ahead Logging (WAL): Log changes to disk before modifying data. Recovery process can replay logs after crash. - Checkpointing: Periodically flush all dirty pages to disk, creating consistent recovery points.

ACID in Different Database Types

  • Relational Databases (SQL): Generally provide full ACID compliance (e.g., PostgreSQL, MySQL InnoDB).
  • NoSQL Databases: Often relax some ACID properties for scalability and availability (e.g., MongoDB offers ACID for single documents, multi-document transactions have performance costs; Redis has limited ACID properties).

ACID Trade-offs

  • Performance vs ACID: Strong ACID guarantees often come with lower performance and scalability. Relaxed ACID (e.g., eventual consistency) can offer higher performance and better scalability.
  • CAP Theorem Impact: ACID systems typically prioritize Consistency over Availability in the presence of Partition Tolerance. NoSQL systems often choose Availability and Partition Tolerance over strong Consistency.

Indexing

An index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.

Why Use Indexes?

  • Faster Queries: Drastically reduces the time it takes to find specific rows, especially in large tables.
  • Improved Performance: Speeds up WHERE clauses, JOIN operations, and ORDER BY clauses.

Common Index Types:

  • B-Tree Index: The most common type of index, suitable for a wide range of queries, including equality and range-based lookups.
  • Hash Index: Used for equality comparisons (=). Very fast for lookups but not suitable for range queries (<, >).
  • Full-Text Index: Used for searching text within a column.
  • Composite Index: An index on two or more columns.

Stored Procedures

A stored procedure is a set of SQL statements that has been created and stored in the database. It can be called by applications, triggers, or other stored procedures.

Advantages of Stored Procedures:

  • Performance: Stored procedures are pre-compiled, which can result in better performance.
  • Reduced Network Traffic: Instead of sending multiple SQL statements over the network, an application only needs to send the name of the stored procedure and its parameters.
  • Security: Provides an extra layer of security by allowing users to perform actions on the data without giving them direct access to the tables.
  • Reusability and Consistency: Encapsulates business logic, ensuring that it is applied consistently across different applications.

Example:

CREATE PROCEDURE GetCustomerOrders(IN customerId INT)
BEGIN
    SELECT o.OrderID, o.OrderDate, p.ProductName
    FROM Orders o
    JOIN OrderDetails od ON o.OrderID = od.OrderID
    JOIN Products p ON od.ProductID = p.ProductID
    WHERE o.CustomerID = customerId;
END;

-- To call the procedure:
CALL GetCustomerOrders(10);

Triggers

A trigger is a special type of stored procedure that automatically runs when a specific event (like INSERT, UPDATE, or DELETE) occurs on a table.

Use Cases for Triggers:

  • Auditing: Automatically logging changes made to a table.
  • Enforcing Complex Business Rules: Implementing rules that cannot be enforced with standard constraints.
  • Maintaining Data Integrity: Keeping related data in sync across different tables.

Example:

-- A trigger to log updates to the Employees table
CREATE TRIGGER AfterEmployeeUpdate
AFTER UPDATE ON Employees
FOR EACH ROW
BEGIN
    INSERT INTO AuditLog(TableName, RecordID, Action, Timestamp)
    VALUES ('Employees', OLD.EmployeeID, 'UPDATE', NOW());
END;

Views

A view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.

Advantages of Views:

  • Simplicity: Can simplify complex queries by providing a simple interface to the underlying tables.
  • Security: Can be used to restrict access to data, allowing users to see only specific columns or rows.
  • Logical Data Independence: The schema of the underlying tables can be changed without affecting the applications that use the view.