0

pcntl_fork — process-level parallelism

Advanced5 min read·php-11-004
compare

Concept

pcntl_fork() is PHP's interface to the POSIX fork() system call, available only on Unix-like systems when the pcntl extension is compiled in (it is never available in PHP-FPM or web SAPIs — only CLI). Fork creates an exact duplicate of the calling process: the child gets copies of all heap memory, file descriptors, and execution state. The parent and child then execute independently. This is true OS-level parallelism — both processes run simultaneously on separate CPU cores with no shared memory.

The return value of pcntl_fork() distinguishes parent from child: it returns 0 inside the child, the child's PID inside the parent, and -1 on failure. After fork, the child typically calls pcntl_exec() to replace itself with a new program, or executes a small unit of work and exits with exit(0). The parent calls pcntl_wait() or pcntl_waitpid() to reap the child's exit status and avoid zombie processes.

The memory model after fork is copy-on-write (COW): the OS does not immediately duplicate all pages. Both processes share the same physical memory pages, marked read-only. Only when either process writes to a page does the OS copy that page for the writer. In PHP this means even a 100 MB heap can be forked quickly, but heavy writes post-fork expand memory usage rapidly.

Practical patterns: Worker pools (Symfony Messenger, PHP-PM) fork N child processes at startup, where each child is an independent event loop or request handler. IPC (inter-process communication) between the parent and children uses pipes (proc_open), Unix domain sockets, or shared memory (shmop_*, msg_queue_*). Redis/database connections must be re-created after fork — never share a database connection across fork boundary, as the same socket file descriptor will be used by two processes concurrently, corrupting both.

Key limitation: PHP does not support OS threads in application code (the pthreads extension is deprecated; parallel is experimental). Forking is the only built-in path to true parallelism for CLI workloads.

Code Example

php
<?php
declare(strict_types=1);

/**
 * Demonstrates pcntl_fork() for parallel CPU-bound work.
 *
 * Scenario: compute MD5 hash of 4 large strings in parallel,
 * each in its own child process, and collect results via pipes.
 *
 * NEVER use database connections, singletons, or open sockets
 * across a fork boundary — recreate them in the child.
 */

if (!function_exists('pcntl_fork')) {
    throw new \RuntimeException('pcntl extension required — run this from CLI.');
}

// Work items — in production these would be queue jobs or file chunks.
$workItems = [
    str_repeat('alpha', 100_000),
    str_repeat('beta', 100_000),
    str_repeat('gamma', 100_000),
    str_repeat('delta', 100_000),
];

$pipes   = [];
$pids    = [];
$results = [];

foreach ($workItems as $index => $data) {
    // Create a pipe: $pipe[0] = read end (parent), $pipe[1] = write end (child).
    if (!stream_socket_pair(STREAM_PF_UNIX, STREAM_SOCK_STREAM, STREAM_IPPROTO_IP, $pipe)) {
        throw new \RuntimeException('Failed to create socket pair.');
    }

    $pid = pcntl_fork();

    if ($pid === -1) {
        throw new \RuntimeException('Fork failed.');
    }

    if ($pid === 0) {
        // ---- CHILD PROCESS ----
        fclose($pipe[0]); // Close read end in child.

        // Do the actual work (CPU-bound).
        $hash = hash('sha256', $data);

        fwrite($pipe[1], $hash);
        fclose($pipe[1]);

        // IMPORTANT: Use exit() not die() to avoid running parent's
        // shutdown handlers in the child.
        exit(0);
    }

    // ---- PARENT PROCESS ----
    fclose($pipe[1]); // Close write end in parent.
    $pipes[$pid] = $pipe[0];
    $pids[]      = $pid;
}

// Collect results from all children.
foreach ($pids as $pid) {
    $result = stream_get_contents($pipes[$pid]);
    fclose($pipes[$pid]);

    $status = 0;
    pcntl_waitpid($pid, $status); // Reap child — prevents zombie processes.

    if (!pcntl_wifexited($status) || pcntl_wexitstatus($status) !== 0) {
        throw new \RuntimeException("Child {$pid} exited abnormally.");
    }

    $results[$pid] = $result;
}

foreach ($results as $pid => $hash) {
    echo "PID {$pid}: {$hash}\n";
}

echo "All children finished.\n";

Interview Q&A

Q: Why must database connections be re-established after pcntl_fork()?

After fork(), both the parent and child hold references to the same underlying OS file descriptor for the database socket. If both send queries concurrently on the same file descriptor, their packets interleave at the TCP level, corrupting both streams. The database server cannot distinguish them — it sees garbled protocol data and typically closes the connection. The correct pattern is: close (or destroy) all database connections, Redis connections, and any stateful sockets immediately after fork in the child process, then open fresh connections in the child before doing any work. PDO, MySQLi, and most PHP database drivers have no automatic fork-safe mechanism — it is entirely the developer's responsibility.


Q: What is a zombie process and how does pcntl_waitpid() prevent it?

When a child process exits, the OS keeps a small record of its exit status in the process table until the parent retrieves it. If the parent never calls wait() / waitpid(), that record stays forever — this is a zombie process (state Z in ps output). Zombies consume a process table entry and eventually exhaust the system's PID namespace. pcntl_waitpid($pid, $status) tells the kernel "I am collecting exit status for child $pid" — the kernel then removes the entry. For long-running parent processes that fork many children, use pcntl_signal(SIGCHLD, SIG_IGN) to auto-reap children, or register a SIGCHLD handler that calls pcntl_waitpid(-1, $status, WNOHANG) in a loop.


Q: When should you use pcntl_fork() instead of a message queue (like Laravel's queue workers)?

pcntl_fork() is appropriate when the work is tightly coupled to data already in the parent's memory (e.g., a large in-memory dataset you want to partition and process in parallel), when latency of serialization and network round-trip to a queue broker is unacceptable, or when writing low-level tooling (process pool managers, test runners like Paratest). Queue workers (Redis, SQS) are better when work needs to survive process crashes, when you need distributed execution across multiple servers, when retry semantics matter, or when the work is generated by one process and consumed by another that starts later. Forking is a sharp tool: it is fast and simple for batch CLI workloads but brittle in web-facing code.