0

Database sharding and read replicas — how and when

Expert5 min read·eng-11-005
interviewsqlperformance

Concept

Database sharding and read replicas — scaling the database tier horizontally.

Read replicas (simplest first step):

  • One primary (write) + N replicas (read-only copies).
  • Primary: receives all writes, replicates changes to replicas asynchronously.
  • Replicas: receive read traffic. No writes allowed.
  • Replication lag: Replicas are slightly behind the primary (usually milliseconds). A user who just wrote data might read stale data from a replica. Fix: route reads for "just written" data to the primary (sticky: true in Laravel).
  • Laravel supports this natively: read/write connection arrays.
  • Typical ratio: 1 primary + 2-4 replicas handles 4-5× read throughput.

Sharding: Splitting data across multiple databases based on a shard key. Each shard holds a subset of rows.

Why shard: When data is too large to fit on one server OR when write throughput exceeds one server's capacity. (Most apps never need sharding — optimize elsewhere first.)

Shard key selection: Critical decision. A bad shard key creates "hot shards" (most traffic to one shard). Good shard keys distribute load evenly.

  • User ID: shard = user_id % N_shards. Balanced, but cross-user queries span all shards.
  • Geographic: Shard by country/region. Satisfies data residency laws.
  • Date: Shard by time period. Good for time-series data. Old shards become read-only.

Problems with sharding:

  • Cross-shard queries: JOIN across shards is impossible in SQL. Must fetch from each shard in PHP and merge. Or denormalize.
  • Transactions: Can't span shards. Need distributed transactions (complex) or design around them.
  • Schema changes: Must apply to all shards.
  • Rebalancing: Moving data when adding shards is complex (consistent hashing helps).

Consistent hashing: A technique where adding/removing shards only moves 1/N of the data (not all of it). Used by Redis Cluster, Cassandra, Memcached.

Code Example

php
<?php
// config/database.php — read/write split (replicas)
'mysql' => [
    'read'  => [
        'host' => [
            env('DB_READ_HOST_1', '10.0.0.2'),
            env('DB_READ_HOST_2', '10.0.0.3'), // round-robin across replicas
        ],
    ],
    'write' => ['host' => [env('DB_WRITE_HOST', '10.0.0.1')]],
    'sticky' => true,  // use write connection for current request after any write
    'driver'   => 'mysql',
    'database' => env('DB_DATABASE'),
    'username' => env('DB_USERNAME'),
    'password' => env('DB_PASSWORD'),
],

// ============================================================
// Manual sharding — shard by user_id
// ============================================================
class ShardedUserRepository
{
    private array $connections = [];
    private int   $numShards = 4;

    public function __construct()
    {
        for ($i = 0; $i < $this->numShards; $i++) {
            $this->connections[$i] = new \PDO(
                "mysql:host=shard{$i}.internal;dbname=users",
                env('DB_USERNAME'), env('DB_PASSWORD'),
            );
        }
    }

    private function getConnection(int $userId): \PDO
    {
        return $this->connections[$userId % $this->numShards]; // hash to shard
    }

    public function find(int $userId): ?array
    {
        $conn = $this->getConnection($userId);
        $stmt = $conn->prepare('SELECT * FROM users WHERE id = ?');
        $stmt->execute([$userId]);
        return $stmt->fetch() ?: null;
    }

    public function create(array $data): int
    {
        $userId = $this->getNextId(); // must be globally unique
        $conn   = $this->getConnection($userId);
        $stmt   = $conn->prepare('INSERT INTO users (id, name, email) VALUES (?, ?, ?)');
        $stmt->execute([$userId, $data['name'], $data['email']]);
        return $userId;
    }

    private function getNextId(): int
    {
        // Use a central ID service (Redis INCR, Snowflake, or UUID)
        return (int) \Cache::increment('global_user_id');
    }

    // Cross-shard query — must fan out to all shards
    public function findByEmail(string $email): ?array
    {
        foreach ($this->connections as $conn) {
            $stmt = $conn->prepare('SELECT * FROM users WHERE email = ?');
            $stmt->execute([$email]);
            if ($user = $stmt->fetch()) return $user;
        }
        return null; // O(n_shards) — expensive!
        // Better: keep a separate email→user_id lookup in Redis
    }
}