Database sharding and read replicas — how and when
Expert5 min read·eng-11-005
interviewsqlperformance
Concept
Database sharding and read replicas — scaling the database tier horizontally.
Read replicas (simplest first step):
- One primary (write) + N replicas (read-only copies).
- Primary: receives all writes, replicates changes to replicas asynchronously.
- Replicas: receive read traffic. No writes allowed.
- Replication lag: Replicas are slightly behind the primary (usually milliseconds). A user who just wrote data might read stale data from a replica. Fix: route reads for "just written" data to the primary (
sticky: truein Laravel). - Laravel supports this natively:
read/writeconnection arrays. - Typical ratio: 1 primary + 2-4 replicas handles 4-5× read throughput.
Sharding: Splitting data across multiple databases based on a shard key. Each shard holds a subset of rows.
Why shard: When data is too large to fit on one server OR when write throughput exceeds one server's capacity. (Most apps never need sharding — optimize elsewhere first.)
Shard key selection: Critical decision. A bad shard key creates "hot shards" (most traffic to one shard). Good shard keys distribute load evenly.
- User ID:
shard = user_id % N_shards. Balanced, but cross-user queries span all shards. - Geographic: Shard by country/region. Satisfies data residency laws.
- Date: Shard by time period. Good for time-series data. Old shards become read-only.
Problems with sharding:
- Cross-shard queries:
JOINacross shards is impossible in SQL. Must fetch from each shard in PHP and merge. Or denormalize. - Transactions: Can't span shards. Need distributed transactions (complex) or design around them.
- Schema changes: Must apply to all shards.
- Rebalancing: Moving data when adding shards is complex (consistent hashing helps).
Consistent hashing: A technique where adding/removing shards only moves 1/N of the data (not all of it). Used by Redis Cluster, Cassandra, Memcached.
Code Example
php
<?php
// config/database.php — read/write split (replicas)
'mysql' => [
'read' => [
'host' => [
env('DB_READ_HOST_1', '10.0.0.2'),
env('DB_READ_HOST_2', '10.0.0.3'), // round-robin across replicas
],
],
'write' => ['host' => [env('DB_WRITE_HOST', '10.0.0.1')]],
'sticky' => true, // use write connection for current request after any write
'driver' => 'mysql',
'database' => env('DB_DATABASE'),
'username' => env('DB_USERNAME'),
'password' => env('DB_PASSWORD'),
],
// ============================================================
// Manual sharding — shard by user_id
// ============================================================
class ShardedUserRepository
{
private array $connections = [];
private int $numShards = 4;
public function __construct()
{
for ($i = 0; $i < $this->numShards; $i++) {
$this->connections[$i] = new \PDO(
"mysql:host=shard{$i}.internal;dbname=users",
env('DB_USERNAME'), env('DB_PASSWORD'),
);
}
}
private function getConnection(int $userId): \PDO
{
return $this->connections[$userId % $this->numShards]; // hash to shard
}
public function find(int $userId): ?array
{
$conn = $this->getConnection($userId);
$stmt = $conn->prepare('SELECT * FROM users WHERE id = ?');
$stmt->execute([$userId]);
return $stmt->fetch() ?: null;
}
public function create(array $data): int
{
$userId = $this->getNextId(); // must be globally unique
$conn = $this->getConnection($userId);
$stmt = $conn->prepare('INSERT INTO users (id, name, email) VALUES (?, ?, ?)');
$stmt->execute([$userId, $data['name'], $data['email']]);
return $userId;
}
private function getNextId(): int
{
// Use a central ID service (Redis INCR, Snowflake, or UUID)
return (int) \Cache::increment('global_user_id');
}
// Cross-shard query — must fan out to all shards
public function findByEmail(string $email): ?array
{
foreach ($this->connections as $conn) {
$stmt = $conn->prepare('SELECT * FROM users WHERE email = ?');
$stmt->execute([$email]);
if ($user = $stmt->fetch()) return $user;
}
return null; // O(n_shards) — expensive!
// Better: keep a separate email→user_id lookup in Redis
}
}