Latency vs throughput — time for one request vs requests per second

Beginner5 min read·eng-20-001

interviewperformance

Concept

Latency vs throughput — two distinct performance dimensions that are often confused but measure different things.

Latency: The time it takes to complete ONE operation. "How long does this request take?" Measured in milliseconds (ms) or microseconds (µs). Examples: response time of a single API call, time for one DB query.

Throughput: How many operations can be completed per unit of time. "How many requests per second can this handle?" Measured in requests/second (RPS), transactions/second (TPS), queries/second (QPS).

The relationship: They're related but independent.

A system can have LOW latency (fast responses) but LOW throughput (can only handle a few at a time).
A system can have HIGH throughput (handles thousands/second) but HIGH latency (each one takes a long time — batch processing).
The ideal: low latency AND high throughput.

Little's Law: Throughput = Concurrency / Latency. If your average response time is 100ms and you have 10 concurrent workers, throughput = 100 RPS. To double throughput: either halve latency OR double concurrency.

Why they're confused: "Slow" can mean either. "The API is slow" might mean:

High latency (a single request takes 2 seconds) — user experience problem.
Low throughput (can only handle 5 RPS before degrading) — scaling problem.

Optimizing latency vs throughput:

Latency: Optimize the critical path of a single request (DB indexes, caching, code optimization).
Throughput: Add more workers/servers (horizontal scaling), use async processing, connection pooling.

Percentiles: Average latency is misleading. p50=50ms (median), p95=500ms (95% of requests), p99=2000ms (99%). The p99 is what your slowest users experience.

Code Example

php

<?php
// MEASURING LATENCY — how long does one request take?
$start = microtime(true);

// ... do work ...
$orders = Order::with('items.product')->where('status', 'pending')->get();

$latency = (microfile(true) - $start) * 1000; // milliseconds
Log::info("Order query latency: {$latency}ms");

// MEASURING THROUGHPUT — requests per second
// Done externally with load testing tools:
// ab -n 1000 -c 50 https://example.com/api/orders
// ApacheBench: 1000 total requests, 50 concurrent
// Output: 127.3 requests/second  ← throughput
//         392ms average latency  ← average latency
//         p99 = 1250ms          ← worst-case latency

// LITTLE'S LAW illustration
// PHP-FPM with 20 workers
// Average request latency = 200ms
// Max throughput = 20 workers / 0.200s = 100 RPS

// To increase throughput:
// 1. Decrease latency: add DB indexes → 100ms → 200 RPS (same 20 workers)
// 2. Add more workers: 40 workers at 200ms → 200 RPS

// PERCENTILES in Laravel monitoring
// Laravel Telescope shows query times
// Laravel Octane tracks p50/p95/p99 latency

// Production monitoring (Datadog, New Relic) alert on:
// p95 > 500ms → investigate latency
// RPS drops 50% → investigate throughput

PreviousTiming attack — exploiting response time differences to leak secretsVocabulary — Security NextCache — storing computed or fetched results to avoid repeating the workVocabulary — Performance