0

Data providers — @dataProvider for parameterized tests

Intermediate5 min read·php-14-006

Concept

Data providers let you run the same test method with multiple sets of inputs without copy-pasting test methods. This is parameterized testing, and it's the right tool whenever you have a function with a finite, enumerable set of meaningful input/output pairs—particularly for edge cases, boundary conditions, and error paths.

A data provider is a public static method (PHPUnit 10+ requires static) that returns an iterable: an array of arrays, a generator, or any iterable. Each inner array is one test case, and its elements are passed as arguments to the test method in order. PHPUnit creates a separate test run for each case, reporting them individually so a failure on one case doesn't hide failures on others.

The #[DataProvider('methodName')] attribute (PHPUnit 10+) links a test method to its provider. The older @dataProvider annotation still works but is deprecated in favor of the attribute. Always key your inner arrays with descriptive strings—PHPUnit uses the key as the test case label, which makes failures instantly readable: "negative number returns zero" beats "data set #3".

Data providers run before setUp(), which means they cannot use any class state initialized in setUp(). This forces providers to be stateless, which is a feature: it prevents test order dependencies in the data generation step.

For complex objects in data providers, construct them inline or use named constructors. Don't call factory methods that depend on application state. If your data provider needs a database, you have an integration test masquerading as a unit test—extract the data generation into a seeder.

ApproachWhen to use
Array of arraysFixed, small set of cases
Generator (yield)Large set, computed cases, memory-sensitive
Static factory callShared datasets across test classes

Code Example

php
<?php
declare(strict_types=1);

namespace Tests\Unit;

use App\Domain\Slugifier;
use PHPUnit\Framework\Attributes\DataProvider;
use PHPUnit\Framework\Attributes\Test;
use PHPUnit\Framework\TestCase;

final class SlugifierTest extends TestCase
{
    // Provider: static, returns array<string, array{string, string}>
    public static function slugCases(): array
    {
        return [
            'basic lowercase'          => ['Hello World', 'hello-world'],
            'strips punctuation'       => ["What's up?", 'whats-up'],
            'collapses multiple spaces'=> ['too   many   spaces', 'too-many-spaces'],
            'trims leading dashes'     => ['--leading', 'leading'],
            'handles unicode'          => ['Héllo Wörld', 'hello-world'],
            'empty string'             => ['', ''],
            'already a slug'           => ['already-slugged', 'already-slugged'],
            'numbers preserved'        => ['PHP 8.4 is great', 'php-84-is-great'],
        ];
    }

    #[Test]
    #[DataProvider('slugCases')]
    public function it_slugifies_input_correctly(string $input, string $expected): void
    {
        $slugifier = new Slugifier();
        $this->assertSame($expected, $slugifier->slugify($input));
    }

    // Generator-based provider — useful for large or computed datasets
    public static function invalidPriceProvider(): \Generator
    {
        yield 'negative integer'    => [-1, 'Price must be non-negative'];
        yield 'negative float'      => [-0.01, 'Price must be non-negative'];
        yield 'exceeds max'         => [100_001, 'Price exceeds maximum allowed'];
        yield 'NaN-like float'      => [INF, 'Price must be finite'];
    }

    #[Test]
    #[DataProvider('invalidPriceProvider')]
    public function it_rejects_invalid_prices(
        int|float $price,
        string $expectedMessage
    ): void {
        $this->expectException(\InvalidArgumentException::class);
        $this->expectExceptionMessage($expectedMessage);

        new \App\Domain\Price($price);
    }

    // Multiple providers on one test (PHPUnit 10+)
    public static function positiveNumbers(): array
    {
        return ['one' => [1], 'hundred' => [100], 'max int' => [PHP_INT_MAX]];
    }

    public static function positiveFloats(): array
    {
        return ['point one' => [0.1], 'pi' => [M_PI]];
    }

    #[Test]
    #[DataProvider('positiveNumbers')]
    #[DataProvider('positiveFloats')]
    public function it_accepts_positive_numbers(int|float $n): void
    {
        $this->expectNotToPerformAssertions();
        new \App\Domain\Price($n); // should not throw
    }
}

Interview Q&A

Q: Why must data provider methods be static in PHPUnit 10+, and what does this constraint prevent?

PHPUnit creates data providers before instantiating the test class, because it needs the dataset count to report the total number of tests before running any. This means the provider cannot depend on instance state—no $this, no setUp() values, no injected dependencies. Making the method static enforces this at the language level. In older PHPUnit versions, non-static providers worked by luck (PHPUnit created a temporary instance without calling setUp()), which led to subtle bugs when providers accidentally relied on uninitialized state.


Q: How should you name data provider cases, and why does it matter more than you might think?

Always use descriptive string keys for your data provider arrays: 'empty string returns empty slug' instead of leaving numeric keys. PHPUnit uses the key as the test case label in output and failure messages. A failure report saying "SlugifierTest::it_slugifies_input_correctly with data set 'empty string returns empty slug'" tells you immediately what input triggered the bug. A report saying "data set #4" forces you to count lines in the provider. This becomes critical in CI where you're reading console output without IDE support.


Q: What is the correct approach when you find yourself wanting to use application services or the database inside a data provider?

You cannot and should not. Data providers must be stateless because they run before the test framework sets up application context. If you need database-sourced test cases, you have two options: either hardcode representative values in the provider (which is usually sufficient and keeps tests fast), or write a separate integration test that loads the real data and asserts on it as a whole. The latter should not be a parameterized test—it should be a single test that fetches all rows and asserts properties. The need to use the database in a provider is a design smell indicating the code under test may need refactoring toward purer functions.