Chunking large datasets — chunk(), chunkById(), lazy(), cursor()

Advanced5 min read·lv-12-020

sqlperformanceinterview

Concept

Processing large datasets is one of the most common sources of memory exhaustion in Laravel applications. Several Eloquent methods exist specifically for this.

chunk(int $size, callable $callback): Runs multiple queries in batches of $size. After each batch the callback is called with the Collection. Once the callback returns, those models are garbage-collected. Suitable for processing + immediate work.

chunkById(int $size, callable $callback, $column = 'id'): Like chunk but uses the primary key to paginate rather than LIMIT/OFFSET. OFFSET-based pagination degrades on large tables (the database must scan and skip rows). chunkById always queries from the last seen ID, making each query fast. Prefer chunkById over chunk for large tables.

lazy(int $chunkSize = 1000): Returns a LazyCollection. Internally uses chunkById. Lets you use fluent collection methods (filter, map) without loading all results into memory. The collection is evaluated lazily — results are fetched one chunk at a time as you iterate.

cursor(): Returns a PHP generator that streams one row at a time via PDO::fetch(). Uses the least memory of all options (one model in memory at a time) but holds the database connection open for the duration. Cannot be rewound. Use for linear processing.

When to use which:

chunk: batch operations, running jobs per batch.
chunkById: preferred over chunk on large tables.
lazy: when you want collection methods on large datasets.
cursor: lowest memory, fire-and-forget iteration.

Code Example

php

<?php
use App\Models\Order;

// chunk — processes 500 rows at a time
Order::where('status', 'pending')
    ->chunk(500, function(\Illuminate\Database\Eloquent\Collection $orders) {
        foreach ($orders as $order) {
            $order->processPayment();
        }
        // After callback, GC frees this batch
    });

// chunkById — safer for large tables (no OFFSET degradation)
// Queries: WHERE id > 0 LIMIT 500, WHERE id > 500 LIMIT 500, etc.
Order::where('status', 'pending')
    ->chunkById(500, function($orders) {
        $orders->each->processPayment();
    });

// IMPORTANT: Don't modify records you're chunking over with chunk()
// This causes rows to be skipped if the updated field is the ORDER BY column.
// Use chunkById() instead — it anchors on ID, not offset.

// lazy() — LazyCollection with fluent methods
Order::lazy(500)
    ->filter(fn($o) => $o->total > 100)
    ->each(fn($o) => ProcessLargeOrder::dispatch($o->id));

// lazy() with where constraint
$totalRevenue = Order::where('status', 'completed')
    ->lazy()
    ->sum('total'); // computed in PHP without loading all rows at once

// cursor() — generator, minimal memory
foreach (Order::cursor() as $order) {
    ProcessOrder::dispatch($order->id);
}

// Memory comparison (10,000 rows):
// Order::all()     → ~50MB (everything in memory)
// chunk(1000)      → ~5MB  (1000 rows at a time)
// cursor()         → ~0.5MB (1 row at a time, 1 query held open)

// Queue jobs in chunks to avoid memory spikes
Order::where('status', 'unfulfilled')
    ->chunkById(200, function($orders) {
        foreach ($orders as $order) {
            FulfillOrderJob::dispatch($order->id); // dispatch by ID, not model
        }
    });

PreviousModel collections — Eloquent Collection vs base CollectionEloquent ORM NextEloquent vs raw Query Builder — when to drop downEloquent ORM