0

Array performance — copy-on-write, when PHP clones vs references

Expert5 min read·php-04-018
performanceinterview

Concept

PHP arrays are powerful but expensive. Understanding their internal implementation — a hash table with collision chains — explains their memory cost and performance characteristics.

A PHP array is implemented as a HashTable containing Bucket structs. Each bucket stores: the zval value, the string or integer key, a hash, and a linked-list pointer for collision chaining. There's also a doubly-linked list threading through all buckets in insertion order (for foreach). Total overhead per element: roughly 64–72 bytes on 64-bit PHP, regardless of whether the value is a small integer.

Copy-on-write for arrays: When you assign $b = $a, PHP doesn't copy the array — it sets a is_ref=false, refcount=2 zval. Only when you write to $b does the engine "separate" — fully duplicating the array. If the array has 1 million elements, the separation copies all 1 million buckets. This is why passing a large array to a function is cheap (no copy) until the function writes to it.

Growth strategy: PHP arrays start at size 8 and double when full. When doubling, the entire hash table is reallocated and all buckets are rehashed — an O(n) operation. This means a sequence of $arr[] = $val insertions is amortized O(1) but has O(n) spikes.

Packed arrays (PHP 7.1+): If an array uses only sequential integer keys (0,1,2,…) without gaps and no deletions have occurred, PHP uses a more compact "packed" representation without the hash table overhead — ~48 bytes per element instead of 72.

Code Example

php
<?php
declare(strict_types=1);

// Memory cost comparison
$before = memory_get_usage();
$dense = range(0, 99_999);                  // Sequential int keys — packed array
echo "Packed:   " . (memory_get_usage() - $before) . " bytes\n"; // ~3.2MB

$before = memory_get_usage();
$sparse = [];
for ($i = 0; $i < 100_000; $i++) {
    $sparse["key_$i"] = $i;                 // String keys — full hash table
}
echo "Assoc:    " . (memory_get_usage() - $before) . " bytes\n"; // ~7MB+

// COW demonstration — assignment is free, modification is expensive
$large = array_fill(0, 100_000, 'value');
$before = memory_get_usage();
$copy = $large;                              // No allocation — COW, refcount=2
echo "After assign:  " . (memory_get_usage() - $before) . "\n"; // ~0

$copy['newkey'] = 'x';                      // Triggers separation — full copy
echo "After write:   " . (memory_get_usage() - $before) . "\n"; // ~3MB+

// Passing arrays to functions
function processArray(array $data): int     // pass by value — copy-on-write
{
    return count($data);                    // only reads, NO separation
}
function appendToArray(array &$data): void  // pass by reference — never copies
{
    $data[] = 'new';
}

// Pre-allocating to avoid rehashing
// PHP doesn't have SplFixedArray for strings, but you can hint size with:
// $arr = array_fill(0, 10_000, null);  // allocates 10k buckets upfront

// in_array performance on large arrays — O(n) scan
$bigArray = range(1, 100_000);
// Slow: in_array(99999, $bigArray) scans all 100k elements
// Fast: use array_flip + isset for repeated lookups
$lookup = array_flip($bigArray); // build hash once
$found  = isset($lookup[99999]); // O(1)