PHP arrays are hash maps — internal structure and implications
Concept
PHP arrays are not arrays in the C or Java sense — they are ordered hash maps implemented as a HashTable structure inside the Zend Engine. Every PHP array, whether indexed or associative, is backed by the same internal data structure: a doubly-linked list of Bucket structs layered on top of a hash table. This is why you can mix integer and string keys in the same array, and why insertion order is always preserved — something that surprised developers coming from C, JavaScript (pre-ES6), or Python dicts before 3.7.
Internally, each Bucket holds the key (either a zend_ulong for integer keys or a zend_string for string keys), the value as a zval, and pointers to the next and previous buckets for both collision chaining and ordered traversal. The hash table itself is a flat C array of uint32_t indices pointing into the Bucket array. When a collision occurs, PHP chains buckets using a singly-linked list embedded inside the Bucket structs. The hash function used for string keys is DJBX33A (the same algorithm used by Zend since PHP 5).
The performance implications of this design are significant. Integer-keyed arrays where keys form a continuous sequence starting from 0 are packed arrays — PHP can skip the hash computation and access elements in near-O(1) time with minimal overhead. Associative arrays with string keys must compute a hash for every lookup, making them slightly slower. Any array that mixes numeric and string keys, or has gaps in numeric keys (e.g., after unset($arr[1])), degrades into a full hash map lookup path. The capacity of the underlying hash table starts at 8 and doubles on overflow, so large arrays cause periodic O(n) rehash operations.
A critical gotcha is that PHP arrays pass by value, not by reference. When you assign one array to another variable, PHP uses copy-on-write (COW) to defer the actual memory copy. The copy only happens at the moment either variable is modified. This means passing a 10,000-element array to a function is cheap — until the function modifies it, at which point PHP performs a full deep clone. Developers who misunderstand COW often either over-optimize (using &$arr everywhere) or under-optimize (modifying arrays in loops without realizing each iteration triggers a copy).
| Feature | PHP Array (HashTable) | C Array | Python list | Python dict |
|---|---|---|---|---|
| Keys | int or string | int offset | int | any hashable |
| Ordered | Yes (insertion order) | No inherent order | Yes | Yes (3.7+) |
| Mixed types | Yes | No | Yes | Yes (values) |
| Lookup time | O(1) avg | O(1) | O(1) | O(1) avg |
| Memory overhead | High (~56 bytes/element) | Very low | Moderate | High |
Code Example
<?php
declare(strict_types=1);
// PHP arrays are hash maps — both of these use the same HashTable internally
$indexed = [10, 20, 30]; // keys: 0, 1, 2
$assoc = ['a' => 1, 'b' => 2]; // keys: 'a', 'b'
$mixed = [0 => 'zero', 'key' => 'value', 2 => 'two']; // all valid
// Integer key casting: '1' becomes 1, '01' stays '01'
$arr = [];
$arr['1'] = 'int-cast'; // stored with key (int)1
$arr['01'] = 'stays str'; // stored with key '01'
var_dump(array_key_exists(1, $arr)); // true
var_dump(array_key_exists('01', $arr)); // true
// Gaps in integer keys force full hash-map path
$packed = [0 => 'a', 1 => 'b', 2 => 'c'];
unset($packed[1]);
// $packed is now [0 => 'a', 2 => 'c'] — no longer packed
// Copy-on-write demonstration
$original = range(1, 1000); // 1000 integers
$alias = $original; // O(1) — just increments refcount, no copy
// The copy happens HERE, at the moment of mutation:
$alias[0] = 9999;
// Internal hash: 'key' → DJBX33A hash → index in bucket array
// Can inspect with spl_object_id() equivalent for arrays: none directly,
// but Xdebug's xdebug_debug_zval() reveals refcount and is_ref
// Memory cost: each bucket is ~56 bytes; an empty array ~336 bytes
$empty = [];
echo memory_get_usage() . PHP_EOL; // baseline
$large = range(1, 10_000);
echo memory_get_usage() . PHP_EOL; // ~560 KB more
// array_key_exists vs isset — critical difference with null values
$data = ['name' => null];
var_dump(isset($data['name'])); // false — null treated as not-set
var_dump(array_key_exists('name', $data)); // true — key exists, value is nullInterview Q&A
Q: Why is array_key_exists() slower than isset() for array lookups, and when must you use array_key_exists() despite the speed difference?
isset() is a language construct compiled to a single opcode that returns false for both missing keys and keys whose value is null. array_key_exists() is a function call with function-call overhead, but it correctly distinguishes between a missing key and a key with a null value. You must use array_key_exists() whenever null is a semantically valid value — for example, in a settings array where null means "explicitly unset" vs a missing key meaning "never configured". In performance-critical hot paths where you control that values are never null, prefer isset().
Q: What happens at the C level when a PHP array exceeds its current capacity, and how does this affect tail-latency in web applications?
When the number of elements in a PHP HashTable exceeds its current capacity (which starts at 8 and doubles each time), the Zend Engine calls zend_hash_rehash(), which allocates a new bucket array of double the size and relinks all existing buckets. This is an O(n) operation and causes a GC pause on the PHP thread. In a web application serving 1,000 RPS, if a hot code path builds an array that crosses the 1,024-to-2,048 boundary on every request, you will see periodic request latency spikes. Mitigation: pre-size arrays with SplFixedArray when the count is known, or build the array in a generator and never materialize it.
Q: A colleague argues that using &$arr (pass by reference) is always faster than passing arrays by value. Is this correct?
It is incorrect in most cases. PHP's copy-on-write mechanism means that passing a large array by value is O(1) if the called function only reads the array — no copy is made. The copy is deferred until mutation. Forcing pass-by-reference bypasses COW and actually pins the array's refcount to 1, which can prevent future COW optimizations and may cause the engine to copy the array earlier in other call sites. Pass by reference is only a genuine win when you intend to mutate the array and want the caller to observe changes — not as a generic performance trick.