0

CAP theorem — what it means for a PHP app developer

Advanced5 min read·eng-11-007
interview

Concept

CAP theorem states that a distributed data system can guarantee at most two of three properties:

  • C — Consistency: Every read receives the most recent write or an error. All nodes see the same data at the same time.
  • A — Availability: Every request receives a response (not an error), though it may not be the most recent data.
  • P — Partition Tolerance: The system continues operating even when network partitions occur (nodes can't communicate with each other).

The key insight: Network partitions (P) WILL happen in any distributed system. You can't avoid them. Therefore, you must choose between C and A:

  • CP systems: Sacrifice availability. When a partition occurs, refuse requests that might return inconsistent data. Systems: HBase, MongoDB (write concern = majority), ZooKeeper.
  • AP systems: Sacrifice consistency. When a partition occurs, continue serving requests, possibly with stale data. Systems: DynamoDB, Cassandra, CouchDB.

What this means for PHP developers:

  • MySQL (single node): Not distributed — CAP doesn't strictly apply. Provides ACID guarantees.
  • MySQL with replication: CP-ish. If the primary fails and a replica isn't fully synced, you choose: block reads (consistent but unavailable) or serve stale data (available but inconsistent).
  • Redis: Typically AP. Redis Sentinel/Cluster can have brief inconsistencies during failover.
  • Read replicas: An AP compromise — reads from replicas may be slightly stale (available, not always consistent).

PACELC: A refinement of CAP that says even when there's NO partition, you still trade off between latency and consistency. A useful model for everyday decisions.

Practical takeaway: For most PHP apps, use a single-region relational DB (ACID compliant), Redis for caching (accept eventual consistency), and design your API to tolerate stale cache reads.

Code Example

php
<?php
// CAP in practice — choosing consistency vs availability for cache reads

// ============================================================
// AVAILABILITY PREFERRED — serve stale data rather than fail
// ============================================================
class ProductService
{
    public function getProduct(int $id): array
    {
        try {
            // Try cache first
            if ($cached = Cache::get("product:{$id}")) {
                return $cached; // might be slightly stale — that's OK
            }
            // Cache miss → try DB
            $product = Product::findOrFail($id);
            Cache::put("product:{$id}", $product->toArray(), 3600);
            return $product->toArray();
        } catch (\Exception $e) {
            // If both cache and DB fail (partition), return a degraded response
            \Log::error("Product service unavailable: {$e->getMessage()}");
            return ['id' => $id, 'name' => 'Product temporarily unavailable', 'available' => false];
        }
        // Chose AVAILABILITY: always return something, even if stale or degraded
    }
}

// ============================================================
// CONSISTENCY PREFERRED — fail rather than return stale data
// ============================================================
class InventoryService
{
    public function getStock(int $productId): int
    {
        // For stock levels, staleness could cause overselling — prefer consistency
        // Skip cache entirely for this critical data
        return Product::lockForUpdate()->findOrFail($productId)->stock;
        // If DB is unavailable: throw exception → show error to user
        // Chose CONSISTENCY: fail rather than show wrong stock count
    }
}

// ============================================================
// EVENTUAL CONSISTENCY — accept lag, design around it
// ============================================================
class OrderAnalytics
{
    public function getTotalRevenue(): float
    {
        // Analytics don't need to be real-time — eventual consistency is fine
        return Cache::remember('analytics:total_revenue', 3600, function () {
            return Order::where('status', 'completed')->sum('total'); // might be 1hr old
        });
        // User sees revenue that's up to 1 hour stale — acceptable for a dashboard
    }
}

// ============================================================
// Replication lag — the real-world AP compromise
// ============================================================
// In config/database.php with 'sticky' => true:
// After a write in the current request, all subsequent reads use the write connection
// (primary) instead of a replica — ensures the writer sees their own writes
// without 'sticky' => false: you might write to primary, then read from a replica
// that hasn't received the replication update yet → inconsistency!