How does PHP execute code? Parsing → AST → opcode → zend_execute

Expert5 min read·eng-09-001

interview

Concept

When PHP executes a script, it does not interpret your source code line by line at runtime. Instead, it compiles the source through a multi-stage pipeline every time a file is loaded — unless OPcache is active. Understanding this pipeline is essential for answering questions about performance, bytecode caching, and why certain reflection tricks work the way they do.

The stages are: lexing (tokenisation), parsing into an Abstract Syntax Tree (AST), compilation of the AST into Zend opcodes, and finally execution of those opcodes by the Zend VM. Each stage is discrete and can be inspected independently. The AST was made a first-class structure in PHP 7; before that, the parser generated opcodes directly, which is why the AST API does not exist in PHP 5.

At execution time the Zend VM processes each opcode sequentially inside zend_execute_ex. The VM is a register-based machine using zend_execute_data frames stacked on a contiguous arena. Function calls push a new frame; returns pop it. This stack-based frame model is why you can examine the call stack with debug_backtrace() and why recursion depth is bounded by memory, not by a hard call-stack limit like C has.

Code Example

php

<?php
declare(strict_types=1);

// You can inspect the opcode stream with the php-ast extension or VLD
// This illustrates what the compiler produces for a simple expression

// Source: $result = $a + $b * $c;

// Approximate opcode sequence:
// MUL       ~0, $b, $c       ; temp ~0 = b * c
// ADD       ~1, $a, ~0       ; temp ~1 = a + (~0)
// ASSIGN    $result, ~1      ; result = ~1

// You can dump opcodes at the CLI without installing VLD:
// php -d opcache.opt_debug_level=0x10000 script.php

// The AST can be inspected at runtime:
$ast = \PhpParser\ParserFactory::createForNewestSupportedVersion()
    ->parse('<?php $result = $a + $b * $c;');
// Each node has type, flags, children — the compiler walks this tree

Interview Q&A

Q: Walk me through what happens between php script.php and the first line of your script executing. Be specific about each stage and what data structures are produced.

The CLI SAPI initialises the Zend Engine, which allocates the memory manager and executor globals. The source file is read and passed to the lexer (zend_language_scanner), which produces a stream of tokens. The parser (zend_language_parser, a LALR(1) Bison grammar) consumes that token stream and builds an AST — a tree of zend_ast nodes allocated in a temporary arena. The compiler (zend_compile.c) then does a single-pass walk of the AST and emits a zend_op_array per function/method/file, containing the opcode sequence and a literals table. Once the top-level zend_op_array is compiled, the executor calls zend_execute with it, creating the initial zend_execute_data frame. Your first line of user code runs from there. With OPcache enabled, the compiled zend_op_array structures are serialised into shared memory after the first request and the lex/parse/compile stages are skipped on subsequent requests.

Q: Where exactly does OPcache plug into this pipeline, and what does it cache versus what must it recompute?

OPcache intercepts after the compile stage. It serialises the resulting zend_op_array (and any class/function entries it defines) into a shared memory segment mapped by all FPM workers. On subsequent requests it performs a stat() call to check file modification time (unless validate_timestamps=0), and if the file is unchanged it de-serialises the cached zend_op_array directly into the executor, skipping lexing, parsing, and compilation entirely. What it does not cache is runtime state: global variables, static property values initialised at runtime, and anything produced by eval(). The script optimizer (level 2 OPcache pass) also runs SSA-based optimisations — constant folding, dead code elimination, type inference — before the result is stored, meaning cached opcodes are already optimised bytecode, not a one-to-one translation of your source.

Q: How would you debug a situation where OPcache seems to be serving stale code after a deployment?

First confirm OPcache is actually active and which settings govern invalidation: opcache.validate_timestamps, opcache.revalidate_freq, and opcache.file_update_protection. If validate_timestamps=0 (common in production for performance), the cache never checks the filesystem — you must explicitly invalidate with opcache_reset() or per-file with opcache_invalidate($path, true). A deployment script should call opcache_reset() via a PHP CLI invocation or an HTTP endpoint guarded by a secret. If you are on a multi-server setup, note that opcache_reset() only clears the cache on the process that called it, so you need to hit each server. The opcache_get_status() function gives you per-file cache hit counts and whether a file is currently cached, which is useful for confirming a specific file was actually evicted.

PreviousString algorithms — palindrome, anagram, substring searchAlgorithms in PHP NextPHP-FPM process model — worker pools, max_children, pm.dynamicSenior Interview Prep — PHP