0

String concatenation performance — the . operator vs sprintf vs interpolation

Intermediate5 min read·php-15-005
performance

Concept

PHP string operations vary dramatically in performance. Understanding when to use built-in string functions vs regex, and which functions are fast vs slow, prevents common performance bottlenecks.

Key performance principles:

  1. Built-in functions beat regex: str_contains(), str_starts_with(), str_ends_with(), strpos() are C-level and far faster than preg_match() for simple checks. Use them when you don't need pattern matching.
  2. Avoid concatenation in loops: $str .= "piece" in a 10,000 iteration loop creates 10,000 intermediate strings. Accumulate into an array and implode() at the end — this is significantly faster.
  3. sprintf vs concatenation: For simple cases, concatenation is marginally faster. For complex formatting, sprintf is cleaner and comparable in speed.
  4. Regular expression compilation: preg_match() compiles the pattern on each call. PHP caches the compiled form (PCRE cache, default 4096 patterns), but cache thrashing on many unique patterns is a risk. Reuse the same patterns.
  5. mb_ functions are slower: mb_strlen(), mb_substr(), etc. involve encoding detection and conversion. Use only when you actually need multibyte support.
  6. str_split() vs mb_str_split(): str_split() splits by bytes; mb_str_split() (PHP 7.4+) splits by characters. For ASCII data, always use str_split().

Code Example

php
<?php
declare(strict_types=1);

$haystack = "Hello, World! This is a test string.";
$needle   = "World";

// FAST: built-in byte-level functions
$found = str_contains($haystack, $needle);    // PHP 8.0+
$found = strpos($haystack, $needle) !== false; // equivalent, slightly more work to write

// SLOW: regex for simple containment
$found = (bool) preg_match('/World/', $haystack); // overkill — don't do this

// SLOW: string building in loop
function buildSlow(array $items): string
{
    $result = '';
    foreach ($items as $item) {
        $result .= "$item, "; // each iteration allocates new string
    }
    return rtrim($result, ', ');
}

// FAST: collect then join
function buildFast(array $items): string
{
    return implode(', ', $items); // one allocation
}

// Regex precompilation cache
function classifyEmailDomains(array $emails): array
{
    $pattern = '/^[^@]+@(gmail|yahoo|outlook)\.com$/i'; // same pattern each call
    return array_filter($emails, fn($e) => preg_match($pattern, $e));
}
// The pattern is compiled once and cached — subsequent calls reuse the compiled form

// str_split vs explode for tokenization
$csv = "Alice,Bob,Charlie";
$parts = explode(',', $csv);           // FAST: delimiter-based
$chars = str_split("Hello", 2);        // ["He", "ll", "o"] — byte chunks

// sprintf vs concatenation
$name = "Alice";
$score = 95.5;
$msg1 = "User: $name, Score: " . number_format($score, 1); // interpolation + concat
$msg2 = sprintf("User: %s, Score: %.1f", $name, $score);   // sprintf
// Both are fine — sprintf is cleaner for multiple substitutions

// mb_ only when needed
$utf8 = "Héllo"; // 5 characters, 6 bytes
echo strlen($utf8);     // 6 (bytes — WRONG for character count)
echo mb_strlen($utf8);  // 5 (characters — correct)
echo str_split($utf8, 1)[1]; // potentially garbage (splits by byte)
echo mb_substr($utf8, 1, 1); // "é" (correct)