Latency vs throughput — time for one request vs requests per second
Concept
Latency vs throughput — two distinct performance dimensions that are often confused but measure different things.
Latency: The time it takes to complete ONE operation. "How long does this request take?" Measured in milliseconds (ms) or microseconds (µs). Examples: response time of a single API call, time for one DB query.
Throughput: How many operations can be completed per unit of time. "How many requests per second can this handle?" Measured in requests/second (RPS), transactions/second (TPS), queries/second (QPS).
The relationship: They're related but independent.
- A system can have LOW latency (fast responses) but LOW throughput (can only handle a few at a time).
- A system can have HIGH throughput (handles thousands/second) but HIGH latency (each one takes a long time — batch processing).
- The ideal: low latency AND high throughput.
Little's Law: Throughput = Concurrency / Latency. If your average response time is 100ms and you have 10 concurrent workers, throughput = 100 RPS. To double throughput: either halve latency OR double concurrency.
Why they're confused: "Slow" can mean either. "The API is slow" might mean:
- High latency (a single request takes 2 seconds) — user experience problem.
- Low throughput (can only handle 5 RPS before degrading) — scaling problem.
Optimizing latency vs throughput:
- Latency: Optimize the critical path of a single request (DB indexes, caching, code optimization).
- Throughput: Add more workers/servers (horizontal scaling), use async processing, connection pooling.
Percentiles: Average latency is misleading. p50=50ms (median), p95=500ms (95% of requests), p99=2000ms (99%). The p99 is what your slowest users experience.
Code Example
<?php
// MEASURING LATENCY — how long does one request take?
$start = microtime(true);
// ... do work ...
$orders = Order::with('items.product')->where('status', 'pending')->get();
$latency = (microfile(true) - $start) * 1000; // milliseconds
Log::info("Order query latency: {$latency}ms");
// MEASURING THROUGHPUT — requests per second
// Done externally with load testing tools:
// ab -n 1000 -c 50 https://example.com/api/orders
// ApacheBench: 1000 total requests, 50 concurrent
// Output: 127.3 requests/second ← throughput
// 392ms average latency ← average latency
// p99 = 1250ms ← worst-case latency
// LITTLE'S LAW illustration
// PHP-FPM with 20 workers
// Average request latency = 200ms
// Max throughput = 20 workers / 0.200s = 100 RPS
// To increase throughput:
// 1. Decrease latency: add DB indexes → 100ms → 200 RPS (same 20 workers)
// 2. Add more workers: 40 workers at 200ms → 200 RPS
// PERCENTILES in Laravel monitoring
// Laravel Telescope shows query times
// Laravel Octane tracks p50/p95/p99 latency
// Production monitoring (Datadog, New Relic) alert on:
// p95 > 500ms → investigate latency
// RPS drops 50% → investigate throughput