Generators vs arrays — when to use each (memory)
Concept
A generator is a function that uses yield to produce a sequence of values on demand — one at a time — rather than computing and storing the entire sequence in memory first. PHP's generator functions return a Generator object that implements Iterator, so they can be used in foreach loops.
The key difference from arrays: an array holds all values simultaneously in memory. A generator holds only the current value plus the suspended function state (local variables, the position in the code). For a sequence of 1 million integers, an array uses ~32MB; a generator uses ~1KB.
When generators outperform arrays:
- Sequences too large to fit in memory (log file processing, large CSV imports)
- Infinite sequences (sequence IDs, event streams)
- Lazy computation chains where many values are filtered before consumption
- Pipeline architectures where each stage processes one item at a time
When arrays are better:
- When you need random access (
$arr[500]) - When you need to know the count upfront
- When you need to iterate the same sequence multiple times (generators are single-use; you'd need to recreate them)
- When the sequence is small — the generator overhead isn't worth it
Code Example
<?php
declare(strict_types=1);
// Generator vs array — memory comparison
function rangeArray(int $n): array
{
return range(1, $n); // allocates full array immediately
}
function rangeGenerator(int $n): Generator
{
for ($i = 1; $i <= $n; $i++) {
yield $i; // suspends here, resumes on next()
}
}
// Memory comparison
$before = memory_get_usage();
$arr = rangeArray(100_000);
echo "Array: " . (memory_get_usage() - $before) . " bytes\n"; // ~4MB
$before = memory_get_usage();
$gen = rangeGenerator(100_000);
echo "Generator: " . (memory_get_usage() - $before) . " bytes\n"; // ~1KB
// Process only what you need — early exit is free
foreach (rangeGenerator(1_000_000) as $n) {
if ($n > 100) break; // generator generates only 100 values total
}
// Real-world pattern: reading a large file line by line
function readLines(string $filename): Generator
{
$fh = fopen($filename, 'r');
try {
while (($line = fgets($fh)) !== false) {
yield trim($line);
}
} finally {
fclose($fh);
}
}
// Chaining generators (lazy pipeline)
function filterEmpty(Generator $lines): Generator
{
foreach ($lines as $line) {
if ($line !== '') yield $line;
}
}
function parseCSV(Generator $lines): Generator
{
foreach ($lines as $line) {
yield str_getcsv($line);
}
}
// Generator key-value pairs
function indexedSquares(int $n): Generator
{
for ($i = 1; $i <= $n; $i++) {
yield $i => $i ** 2; // yield key => value
}
}
foreach (indexedSquares(5) as $n => $square) {
echo "$n² = $square\n";
}Interview Q&A
Q: What is a PHP generator and how does it differ from an array in terms of memory usage?
A generator is a function using yield that produces values lazily — one at a time, on demand. It suspends execution at each yield, saves its local state, and resumes when the caller requests the next value. An array stores all values in memory at once. For a 1-million-item sequence, an array uses tens of megabytes; a generator uses roughly a kilobyte regardless of how many items it would produce. Generators are ideal for processing large datasets (log files, DB cursors, API pagination) where loading everything into memory at once would be impractical.
Q: What are the limitations of generators compared to arrays?
Generators are forward-only, single-use iterators. You cannot: access elements by index ($gen[5]), rewind and re-iterate without recreating the generator, get the total count without consuming the generator, or pass a generator to functions expecting an array (like array_map). They also add per-item overhead (context switching at each yield) that makes them slower than arrays for small in-memory datasets. Use generators when the dataset doesn't fit in memory or when lazy evaluation is architecturally valuable.
Q: How would you use a generator to implement memory-efficient CSV processing in PHP?
Create a generator that opens the file with fopen, yields each row via fgetcsv in a while loop, and closes the file in a finally block. Each call to foreach triggers the generator to read one line, parse it, and yield the result — the entire CSV is never in memory at once. You can then chain additional generator functions to filter or transform rows lazily, building a pipeline where each stage processes one row at a time. This pattern handles files of arbitrary size within a fixed memory budget.