Array performance — copy-on-write, when PHP clones vs references
Concept
PHP arrays are powerful but expensive. Understanding their internal implementation — a hash table with collision chains — explains their memory cost and performance characteristics.
A PHP array is implemented as a HashTable containing Bucket structs. Each bucket stores: the zval value, the string or integer key, a hash, and a linked-list pointer for collision chaining. There's also a doubly-linked list threading through all buckets in insertion order (for foreach). Total overhead per element: roughly 64–72 bytes on 64-bit PHP, regardless of whether the value is a small integer.
Copy-on-write for arrays: When you assign $b = $a, PHP doesn't copy the array — it sets a is_ref=false, refcount=2 zval. Only when you write to $b does the engine "separate" — fully duplicating the array. If the array has 1 million elements, the separation copies all 1 million buckets. This is why passing a large array to a function is cheap (no copy) until the function writes to it.
Growth strategy: PHP arrays start at size 8 and double when full. When doubling, the entire hash table is reallocated and all buckets are rehashed — an O(n) operation. This means a sequence of $arr[] = $val insertions is amortized O(1) but has O(n) spikes.
Packed arrays (PHP 7.1+): If an array uses only sequential integer keys (0,1,2,…) without gaps and no deletions have occurred, PHP uses a more compact "packed" representation without the hash table overhead — ~48 bytes per element instead of 72.
Code Example
<?php
declare(strict_types=1);
// Memory cost comparison
$before = memory_get_usage();
$dense = range(0, 99_999); // Sequential int keys — packed array
echo "Packed: " . (memory_get_usage() - $before) . " bytes\n"; // ~3.2MB
$before = memory_get_usage();
$sparse = [];
for ($i = 0; $i < 100_000; $i++) {
$sparse["key_$i"] = $i; // String keys — full hash table
}
echo "Assoc: " . (memory_get_usage() - $before) . " bytes\n"; // ~7MB+
// COW demonstration — assignment is free, modification is expensive
$large = array_fill(0, 100_000, 'value');
$before = memory_get_usage();
$copy = $large; // No allocation — COW, refcount=2
echo "After assign: " . (memory_get_usage() - $before) . "\n"; // ~0
$copy['newkey'] = 'x'; // Triggers separation — full copy
echo "After write: " . (memory_get_usage() - $before) . "\n"; // ~3MB+
// Passing arrays to functions
function processArray(array $data): int // pass by value — copy-on-write
{
return count($data); // only reads, NO separation
}
function appendToArray(array &$data): void // pass by reference — never copies
{
$data[] = 'new';
}
// Pre-allocating to avoid rehashing
// PHP doesn't have SplFixedArray for strings, but you can hint size with:
// $arr = array_fill(0, 10_000, null); // allocates 10k buckets upfront
// in_array performance on large arrays — O(n) scan
$bigArray = range(1, 100_000);
// Slow: in_array(99999, $bigArray) scans all 100k elements
// Fast: use array_flip + isset for repeated lookups
$lookup = array_flip($bigArray); // build hash once
$found = isset($lookup[99999]); // O(1)