0

Index — a data structure that speeds up reads at the cost of write overhead

Beginner5 min read·eng-15-006
sqlinterviewperformance

Concept

Index — a separate data structure that maintains a sorted or hashed copy of one or more columns, allowing the database to find rows much faster than scanning the whole table.

The analogy: An index is like the index at the back of a book. Instead of reading all 500 pages to find "polymorphism," you go to the index → "polymorphism: 234, 278." Fast lookup.

Without an index: SELECT * FROM users WHERE email = 'alice@example.com' scans EVERY row in sequence (Full Table Scan). For 1 million users, that's 1 million row reads. O(n).

With an index on email: The database navigates a B-tree (balanced tree) to find the exact row in O(log n) — a few reads regardless of table size.

B-tree index (most common): Self-balancing tree. Supports equality (=), range (<, >, BETWEEN), prefix LIKE 'abc%'. Used for most standard indexes.

Hash index: Hash map. Ultra-fast equality lookup O(1). But ONLY equality — no range queries, no LIKE. MySQL InnoDB doesn't support hash indexes (Memory engine does). PostgreSQL has hash indexes.

Composite index: Index on multiple columns. CREATE INDEX ON orders (user_id, status). Order matters: a composite index (a, b) helps queries on a alone or (a, b) together, but NOT on b alone. This is the "leftmost prefix rule."

Index trade-offs:

  • Reads faster: Dramatically — index lookups vs full scan.
  • Writes slower: Every INSERT, UPDATE, DELETE must update all affected indexes.
  • Storage cost: Indexes are stored separately — can double table size.
  • Too many indexes: Slows writes and confuses the query planner.

Covering index: An index that contains ALL columns needed by a query — the DB can answer the query from the index alone, never touching the main table (index-only scan).

Code Example

php
<?php
// Creating indexes in Laravel migrations
Schema::create('orders', function (Blueprint $table) {
    $table->id();
    $table->foreignId('user_id');
    $table->string('status');
    $table->decimal('total', 10, 2);
    $table->timestamp('created_at');

    // Single-column indexes
    $table->index('user_id');                    // searches by user
    $table->index('status');                     // searches by status

    // Composite index — (user_id, status) together
    $table->index(['user_id', 'status']);         // helps: WHERE user_id=? AND status=?
                                                 // also helps: WHERE user_id=?
                                                 // does NOT help: WHERE status=? alone

    // Unique index — enforces uniqueness + speeds up lookups
    $table->unique('email');
});

// Adding an index to an existing table
Schema::table('orders', function (Blueprint $table) {
    $table->index(['status', 'created_at'], 'idx_orders_status_created');
});

// COVERING INDEX example
// Query: SELECT user_id, status, total FROM orders WHERE user_id = 1
// Covering index: CREATE INDEX idx_cover ON orders (user_id, status, total)
// → DB can answer from the index alone — never reads the main table rows!
\DB::statement('CREATE INDEX idx_orders_cover ON orders (user_id, status, total)');

// EXPLAIN — check if your index is being used
$explain = \DB::select('EXPLAIN SELECT * FROM orders WHERE user_id = 1 AND status = ?', ['pending']);
// Look for:
// type: 'ref' or 'range' — index used (GOOD)
// type: 'ALL' — full table scan (BAD — add an index!)
// key: name of the index being used
// rows: estimated rows examined (lower is better)

// When NOT to index
// Columns with low cardinality (few unique values) — e.g., boolean 'active'
// The optimizer may not use an index if >20-30% of rows match
// $table->index('active'); // OFTEN USELESS — only 2 distinct values

// Composite index leftmost prefix rule:
// Index (user_id, status, created_at) helps:
// WHERE user_id = 1                          ✓ (leftmost prefix)
// WHERE user_id = 1 AND status = 'pending'   ✓ (first two columns)
// WHERE user_id = 1 AND status = 'pending' AND created_at > ?  ✓ (all three)
// WHERE status = 'pending'                   ✗ (not leftmost column — full scan)
// WHERE user_id = 1 AND created_at > ?       ✓ partially (user_id used, created_at might not be)