0

String immutability and copy-on-write internals

Expert5 min read·php-03-015
performancecompare

Concept

PHP strings are immutable value types — every string operation that "modifies" a string actually creates a new one. Internally this is managed through a combination of zval reference counting and copy-on-write (COW), the same mechanism that governs arrays.

When you write $b = $a where $a is a string, PHP does not allocate a new zend_string. Instead, both $a and $b's zvals point to the same zend_string object, and its refcount increments to 2. Only when you write to $b (e.g., $b .= '!') does PHP "separate" — it decrements the original zend_string's refcount, allocates a fresh one, writes the modification there, and assigns it to $b. The original $a is completely unaffected.

Interned strings are a special case: PHP interns (permanently de-duplicates) string literals and identifiers that appear in source code. An interned string has IS_STR_INTERNED flag set, which disables reference counting entirely — it lives for the full request (or the lifetime of the opcache entry) and is never freed. This is why PHP is memory-efficient for repeated string constants.

The performance implications:

  • String functions like strtolower($s) return a new zend_string$s is unchanged, a new allocation happens
  • $s .= $chunk in a loop: if $s has refcount 1 (only one variable points to it), PHP 7+ can sometimes resize in-place using realloc — a significant optimization for string building
  • Passing a string to a function by value: only incurs cost if the function actually writes to it (COW), so large strings are cheap to pass by value

Code Example

php
<?php
declare(strict_types=1);

// Immutability — string functions return new strings
$original = 'Hello World';
$upper    = strtoupper($original);

echo $original; // "Hello World" — unchanged
echo $upper;    // "HELLO WORLD" — new string

// COW demonstration with memory tracking
$big = str_repeat('x', 1_000_000); // 1MB string

$before = memory_get_usage();
$copy   = $big;                     // No allocation — just refcount bump
$after_assign = memory_get_usage();
echo ($after_assign - $before);     // ~0 bytes

$copy[0] = 'Y';                     // Write triggers separation
$after_write = memory_get_usage();
echo ($after_write - $before);      // ~1MB — actual copy made

// Efficient string building — refcount-1 in-place optimization
$s = '';
for ($i = 0; $i < 100_000; $i++) {
    $s .= 'x'; // PHP can realloc in-place when refcount == 1
}
// $s is the only variable pointing to the string, so PHP avoids extra allocs

// Contrast with this (creates an extra reference, defeats optimization):
$ref = &$s; // now refcount == 2
$s .= 'y';  // must separate — can't resize in-place when refcount > 1
unset($ref);

// String interning — identical literals share storage
$a = 'hello';
$b = 'hello';
// Both point to the same interned zend_string — verified with xdebug_debug_zval

// substr() returns a new string (copy on return)
$sub = substr($s, 0, 10);
// $sub is a new zend_string; $s is unmodified