0

Reading large files with generators — memory-efficient processing

Advanced5 min read·php-13-009
performance

Concept

PHP provides path manipulation functions for extracting components from file paths, resolving relative paths, and working with the filesystem path tree. These are essential for building portable code that doesn't assume specific directory structures.

Path component functions:

  • dirname(string $path, int $levels = 1): Returns the parent directory. dirname('/var/www/app/config.php')'/var/www/app'. Pass $levels = 2 to go up two levels.
  • basename(string $path, ?string $suffix = null): Returns the final component. basename('/var/www/app/config.php')'config.php'. With $suffix: basename('/var/www/app/config.php', '.php')'config'.
  • pathinfo(string $path, int $options = PATHINFO_ALL): Returns array or specific component. Keys: dirname, basename, filename (without extension), extension. Pass PATHINFO_EXTENSION to get just the extension.

Path resolution:

  • realpath(string $path): Resolves .., ., and symlinks to an absolute canonical path. Returns false if path doesn't exist. Use to validate that a user-supplied path doesn't escape a sandboxed directory (compare strpos(realpath($userPath), realpath($allowedBase)) — path traversal defense).
  • __DIR__: Magic constant — the directory of the current script. Always absolute, never affected by CWD. Use __DIR__ . '/config.php' instead of bare 'config.php'.
  • __FILE__: Magic constant — the absolute path of the current script.

Code Example

php
<?php
declare(strict_types=1);

$path = '/var/www/app/public/../config/database.php';

echo dirname($path);                         // '/var/www/app/public/../config'
echo dirname($path, 2);                      // '/var/www/app/public/..'
echo basename($path);                        // 'database.php'
echo basename($path, '.php');                // 'database'

$info = pathinfo($path);
// [
//   'dirname'   => '/var/www/app/public/../config',
//   'basename'  => 'database.php',
//   'extension' => 'php',
//   'filename'  => 'database',
// ]

echo pathinfo($path, PATHINFO_EXTENSION); // 'php'
echo pathinfo($path, PATHINFO_FILENAME);  // 'database'

// realpath — resolves .. and symlinks, requires path to exist
$real = realpath('/var/www/app/public/../config/database.php');
// '/var/www/app/config/database.php'

// Path traversal defense
function safeReadFile(string $baseDir, string $userInput): string
{
    $baseDir  = rtrim(realpath($baseDir), '/');
    $fullPath = realpath($baseDir . '/' . $userInput);

    if ($fullPath === false || !str_starts_with($fullPath, $baseDir . '/')) {
        throw new \InvalidArgumentException("Path traversal attempt: $userInput");
    }

    return file_get_contents($fullPath);
}

// Try to read '../../etc/passwd' — realpath resolves it, str_starts_with blocks it
// safeReadFile('/var/www/uploads', '../../etc/passwd'); // throws

// __DIR__ — robust relative path resolution
$configPath = __DIR__ . '/../config/app.php'; // relative to this file, not CWD
$config = require $configPath;

// Build paths portably (avoid hardcoding /)
$paths = [
    __DIR__,
    'config',
    'database.php',
];
$joined = implode(DIRECTORY_SEPARATOR, $paths);
// On Windows: C:\var\www\config\database.php
// On Linux:   /var/www/config/database.php