Compiling SELECT — assembling the SQL string from state
Concept
Compiling a SELECT statement means walking the builder's state arrays in a fixed order and producing a valid SQL string. The canonical clause order is: SELECT columns FROM table JOIN clauses WHERE conditions GROUP BY columns HAVING conditions ORDER BY columns LIMIT n OFFSET m. Each clause is only emitted if the corresponding state is non-empty. The compiler must also handle the edge case where $columns is ['*'] — it emits SELECT * rather than backtick-quoting the asterisk as a column name.
Column quoting is a subtle correctness issue. Production query builders wrap every identifier (table name, column name) in dialect-specific quotes: backticks for MySQL, double-quotes for PostgreSQL and SQLite. This prevents collisions with SQL reserved words — a table named order or group will break an unquoted query. Our compiler uses backticks and accepts that this couples it to MySQL/SQLite. Laravel's Grammar system solves this by injecting a grammar object whose quoteColumn() method returns the correct delimiter for the active connection.
The compiler returns two things: the SQL string (for debugging and logging) and the bindings array (for PDO). These are kept separate because PDO's prepare() + execute() workflow requires them to be delivered at different times. The compiler should never concatenate binding values directly into the SQL string — that would negate the entire purpose of prepared statements and open a SQL injection vector.
Aggregate functions (COUNT, MAX, MIN, SUM, AVG) are handled by temporarily replacing the columns list with the aggregate expression. The simplest approach is a dedicated compileAggregate() method that overrides the column output when an $aggregate property is set on the builder. This is how Illuminate\Database\Query\Grammars\Grammar::compileAggregate() works.
Code Example
<?php
declare(strict_types=1);
namespace Lumen\Database;
/**
* SELECT compiler for SelectQuery.
*
* Reads the state held by SelectQuery and produces a SQL string
* plus a flat named-binding array suitable for PDO::execute().
*
* Laravel equivalent: Illuminate\Database\Query\Grammars\Grammar
*/
class QueryCompiler
{
/**
* Compile a full SELECT statement from the builder's state.
*
* @return array{sql: string, bindings: array<string, mixed>}
*/
public function compileSelect(SelectQuery $query): array
{
$parts = [];
// 1. SELECT clause
$parts[] = $this->compileColumns($query);
// 2. FROM clause
$parts[] = 'FROM ' . $this->quoteIdentifier($query->getTable());
// 3. JOIN clauses
if ($query->getJoins()) {
$parts[] = $this->compileJoins($query->getJoins());
}
// 4. WHERE clause
if ($query->getWheres()) {
$parts[] = $this->compileWheres($query->getWheres());
}
// 5. GROUP BY clause
if ($query->getGroups()) {
$parts[] = 'GROUP BY ' . implode(', ', array_map(
fn(string $col) => $this->quoteIdentifier($col),
$query->getGroups()
));
}
// 6. ORDER BY clause
if ($query->getOrderBys()) {
$parts[] = $this->compileOrderBys($query->getOrderBys());
}
// 7. LIMIT / OFFSET
if ($query->getLimitValue() !== null) {
$parts[] = 'LIMIT ' . $query->getLimitValue();
}
if ($query->getOffsetValue() !== null) {
$parts[] = 'OFFSET ' . $query->getOffsetValue();
}
return [
'sql' => implode(' ', $parts),
'bindings' => $query->getBindings(),
];
}
// ----------------------------------------------------------------
// Clause compilers
// ----------------------------------------------------------------
private function compileColumns(SelectQuery $query): string
{
$columns = $query->getColumns();
if ($columns === ['*']) {
return 'SELECT *';
}
$quoted = array_map(
fn(string $col) => $this->compileColumn($col),
$columns
);
return 'SELECT ' . implode(', ', $quoted);
}
/**
* Compile a single column expression.
* Supports:
* 'column' → `column`
* 'table.column' → `table`.`column`
* 'column AS alias' → `column` AS `alias`
* 'COUNT(*)' → passed through as-is (raw expression)
*/
private function compileColumn(string $column): string
{
// Raw expressions: contain ( ) or spaces without AS
if (str_contains($column, '(') || str_contains($column, ' ')) {
return $column;
}
// Table-qualified: table.column
if (str_contains($column, '.')) {
[$table, $col] = explode('.', $column, 2);
return $this->quoteIdentifier($table) . '.' . $this->quoteIdentifier($col);
}
return $this->quoteIdentifier($column);
}
private function compileWheres(array $wheres): string
{
$parts = [];
foreach ($wheres as $i => $where) {
$sql = match ($where['type']) {
'basic' => $this->compileWhereBasic($where),
'in' => $this->compileWhereIn($where, false),
'not_in' => $this->compileWhereIn($where, true),
'null' => $this->compileWhereNull($where, false),
'not_null' => $this->compileWhereNull($where, true),
default => throw new \RuntimeException("Unknown where type: {$where['type']}"),
};
// First condition: no boolean prefix. Subsequent: AND/OR.
$parts[] = ($i === 0 ? '' : $where['boolean'] . ' ') . $sql;
}
return 'WHERE ' . implode(' ', $parts);
}
private function compileWhereBasic(array $where): string
{
return sprintf(
'%s %s %s',
$this->quoteIdentifier($where['column']),
$where['operator'],
$where['value'] // named placeholder, e.g. :where_0
);
}
private function compileWhereIn(array $where, bool $not): string
{
$keyword = $not ? 'NOT IN' : 'IN';
$placeholders = implode(', ', $where['values']);
return sprintf(
'%s %s (%s)',
$this->quoteIdentifier($where['column']),
$keyword,
$placeholders
);
}
private function compileWhereNull(array $where, bool $not): string
{
$keyword = $not ? 'IS NOT NULL' : 'IS NULL';
return $this->quoteIdentifier($where['column']) . ' ' . $keyword;
}
private function compileJoins(array $joins): string
{
$parts = [];
foreach ($joins as $join) {
$type = strtoupper($join['type']);
$parts[] = sprintf(
'%s JOIN %s ON %s %s %s',
$type,
$this->quoteIdentifier($join['table']),
$this->compileColumn($join['first']),
$join['operator'],
$this->compileColumn($join['second'])
);
}
return implode(' ', $parts);
}
private function compileOrderBys(array $orderBys): string
{
$parts = array_map(
fn(array $order) => $this->quoteIdentifier($order['column']) . ' ' . $order['direction'],
$orderBys
);
return 'ORDER BY ' . implode(', ', $parts);
}
// ----------------------------------------------------------------
// Aggregate helper
// ----------------------------------------------------------------
/**
* Compile a SELECT with an aggregate function replacing the columns.
*
* @return array{sql: string, bindings: array<string, mixed>}
*/
public function compileAggregate(SelectQuery $query, string $function, string $column = '*'): array
{
// Temporarily override columns with the aggregate expression
$expr = strtoupper($function) . '(' . ($column === '*' ? '*' : $this->quoteIdentifier($column)) . ')';
// We build a partial SQL manually to replace the SELECT clause
$base = $this->compileSelect($query);
// Replace "SELECT *" or "SELECT `cols`" with the aggregate
$sql = preg_replace(
'/^SELECT .+? FROM/U',
'SELECT ' . $expr . ' FROM',
$base['sql'],
limit: 1
);
return ['sql' => $sql, 'bindings' => $base['bindings']];
}
// ----------------------------------------------------------------
// Identifier quoting
// ----------------------------------------------------------------
/**
* Wrap an identifier in backticks (MySQL/SQLite dialect).
* Passes through '*' unquoted.
*/
private function quoteIdentifier(string $identifier): string
{
if ($identifier === '*') {
return '*';
}
// Already quoted
if (str_starts_with($identifier, '`')) {
return $identifier;
}
return '`' . str_replace('`', '``', $identifier) . '`';
}
}Interview Q&A
Q: Why does the compiler emit named placeholders (:where_0) rather than concatenating the actual values into the SQL string?
Concatenating values into SQL is the definition of a SQL injection vulnerability. PDO's prepared statement mechanism works in two phases: prepare() sends the SQL template to the database engine, which parses and compiles a query plan; execute() binds the actual values as data, not as SQL tokens. The database engine never interprets binding values as SQL syntax. By keeping the SQL string and the bindings array separate, the compiler guarantees that no user-supplied value can alter the structure of the query. This is the same reason Illuminate\Database\Query\Builder maintains a $bindings array that is only merged with the SQL string at the PDO layer.
Q: How does Laravel's Grammar class differ from this compiler?
Laravel separates the query builder (Illuminate\Database\Query\Builder) from the SQL generator (Illuminate\Database\Query\Grammars\Grammar). The builder accumulates state; the grammar compiles it. There is a base Grammar class and dialect-specific subclasses: MySqlGrammar, PostgresGrammar, SQLiteGrammar, SqlServerGrammar. Each overrides only the methods where the dialect differs — for example, MySqlGrammar::compileInsertOrIgnore() uses INSERT IGNORE INTO while PostgresGrammar uses INSERT INTO ... ON CONFLICT DO NOTHING. This Strategy pattern means adding a new database dialect only requires subclassing Grammar, not modifying the builder.
Q: What is the UNION all non-obvious case for clause ordering in a SELECT?
UNION queries present an ordering challenge because ORDER BY and LIMIT apply to the entire union result, not individual member queries. MySQL requires that ORDER BY and LIMIT appear only at the end of the last UNION member, and member queries must not have their own ORDER BY unless wrapped in a subquery. Laravel handles this in Grammar::compileUnions() by appending the ORDER BY and LIMIT after all UNION clauses. A simple clause-order compiler that emits ORDER BY directly after the FROM clause would produce invalid SQL for UNION queries.