The case for partials and pipes in PHP

The Partial Function Application RFC is currently in voting, and right now it's a close vote to the negative. I wanted to take this opportunity to try and make the broader case for partial application and for its related RFC, the pipe operator, in a way that is more appropriate for a blog post than the RFC body (which is, by design, more concerned with the finer details of "what").

The main pushback on the RFC so far is that the benefits don't outweigh the cost of yet-more-syntax in the language. Which is a fair position to hold, albeit one I hope to convince you is incorrect. That is, I believe the benefits vastly outweigh the syntax and implementation cost.

Background

While these are both themselves fairly targeted changes, their implications are far larger than just their immediate syntactic benefit. To understand the context (which I may have been poor at explaining so far, hence this post), I want to take a small step back.

The other night I happened across this video on functional programming, which is worthwhile watching for anyone. It's the most succinct and approachable answer to "but... why?" for functional programming I've seen yet.

The key take-aways, for those who don't have the hour to watch it right now (but you should at least bookmark it), are as follows:

In software, we are drowning in complexity. Complexity and spooky action at a distance are killing us.
The answer to that is building deliberately-composable small pieces and assembling them.
That's not a new statement, by any means, but the FP (Functional Programming) approach to it is still not as widespread as it should be.
The FP approach is built on "type-safe function composition."

Composition is not a new or functional-specific concept. In OOP, the standard refrain is "composition over inheritance," in which "composition" means, in practice, "wrap objects inside each other until you get what you want."

In functional programming, composition means "stick functions end to end so that the output of one is the input for the next, in a type safe way." That is, you can take any two single-parameter functions and stick them together end to end, and you get a new function.

That is, you're doing this a lot:

$result = f(g(h($val)));

Which is very common but also very ugly to read. Functional languages therefore tend to have a way to write that in a less ugly way. The details vary, but the abstract math version of it would be:

f ∘ g ∘ h $val;

Which is valid code in no language, but in concept means the same thing as the previous line. But you can also give f ∘ g ∘ h a name, and poof, you now have a new function.

The pipe operator

That leads us to the pipe operator proposal, which is separate from partial application but dovetails with it. The pipe operator is, essentially, a form of ∘. It would work like this:

$result = $val |> h |> g |> f.

That is, pass the value on the left to the function on the right, and just keep doing that. It's very simple, but very powerful. It's also available in a number of languages.

There are, however, two important caveats. One, PHP's options for referring to a function by name when not calling it directly all suck. Two, that only works if your functions take a single argument. Most of PHP's standard library does not, nor do a huge number of functions in the wild.

Partial application

There have been various proposals to resolve the first point, referencing functions. Partial application solves both problems.

Partial Function Application is a fancy-pants name for halfway calling a function. "Application" is the fancy-pants name for "calling a function," so it literally means "partially calling a function." That is, providing some of the arguments to a function and then providing the rest later. You can already do that very easily in PHP today with short-lambdas, aka arrow functions.

$replacer = fn(string $subject): string => str_replace('Hello', 'Hi', $subject);

That gives you a new function, $replacer, that in any meaningful sense is just str_replace() but with the first two arguments already provided, that is, already "applied." You can now use it to transform any string into another string... and thus you can concatenate it (or "pipe" it, or compose it, all the same thing) either before or after any function that takes or returns a string.

Alternatively you can also use it in an array_map() call. Or some other function that returns a boolean could be used in array_filter().

That's still very verbose, though. The entire left-hand side of the line is entirely redundant; everything there can be inferred from str_replace() itself. A dedicated syntax for Partial Application would effectively drop the whole left-side of the line:

$replacer = str_replace('Hello', 'Hi', ?);

The name of the argument, its type, and the return of the function are all trivially inferred from str_replace() itself. The result is the same functionality in a much more compact form, without lots of redundant syntax flotsam floating about creating visual clutter.

The RFC goes into a lot of detail about edge cases and handling oddballs (because for an RFC, those details matter), but at its core, that's all we're talking about: Making a function that delegates to another function both easier to read and easier to write.

The RFC includes two placeholders: ? means "a single argument here, borrowed from the base function." ... means "zero or more placeholders here, borrowed from the base function." Mainly, that allows for a really convenient shorthand: str_replace(...) would create a function that is identical to str_replace(), but because it's not in a string you can use it as a way to refer to a function by name. It also does the same for methods, which are an even bigger syntactic mess right now. [$encryptor, 'hash'] becomes $encryptor->hash(...), which is both easier to read and easier for tooling to refactor for you.

Applications of partial application

One of the pushbacks we've heard for the PFA RFC is "people don't actually write PHP like that." (Where "like that" means with piping data from one function to another.) That is both true and false at the same time. (Anyone seen a cat around here?) It's true that very few people are writing code that looks like that right now, because the result would be rather ugly, syntactically. It's false in that people are still solving problems that could be solved in a much more compact and readable way, but doing so the complicated way.

Unfortunately, this space doesn't lend itself nicely to compact examples that aren't contrived. Today, pipelines are usually built using complex multi-object structures. The PHP League even has a library specifically for building pipelines. That entire library can be reduced to just the simple |> operator.

The League's package recommends making each step of a pipeline its own object with an __invoke() method, because in PHP today that's the only way to write a callable you can easily reference. However, that is a lot of extra syntax just to avoid PHP's silliness about [$foo, 'bar']. Partial application makes any function capable of being used in that sort of pipeline. That's because...

Object constructors are essentially an extremely verbose form of hard-wired partial application.

Let's look at some examples of where partials and pipes would simplify code.

Health check pipeline

A few months back, I was working on a health check script. The script needed to look up a series of sites from an API call, then ping each site to grab its home page, then check various things about the response (that it was an HTTP 200, that the page body had certain text in it, etc.)

In a procedural approach, it would looks something like this (in PHP-pseudocode):

function getData()
{
   $list_of_sites = call_api($GLOBALS['credentials']);

   $client = new GuzzleClient();

   $records = [];

   foreach ($list_of_sites as $url) {
       /** @var \Psr\Http\Message\ResponseInterface $response */
       $response = $client->get($url);
       if ($response->getStatusCode() === 200) {
           $body = (string)$response->getBody();
           $records[$url]['credit'] = str_contains($body, 'Company name');
           $records[$url]['badge'] = str_contains($body, '<img src="some_url"');

           // Do some other multi-line computation here.
           $records[$url]['good_cache_headers'] = $result;
       }
       else {
           $records[$url]['success'] = false;
       }
   }

   print_table($records);
}

That does work, but it's all muddled together. One step bleeds into the next, it's hard to tell which step is leaving variables lying about, etc. And if you want to add more auditing steps, the function itself just grows and grows. This style, however, is extremely common in modern PHP, although it's usually put inside a method and (incorrectly) called "object-oriented [sic]".

In more truly-OOP code, it would be vastly longer, with an interface, a series of separate classes that all take some kind of configuration dependency, and probably modify an array that gets passed to them as an "enhancer" pass. That obfuscates what's really going on behind many layers of abstraction that don't need to be there.

With pipes and filters, however, we're encouraged to break the process up into discrete steps of whatever complexity or simplicity is appropriate. (It's possible to do with short lambdas, or even with long-form closures, but the compactness of pipes and partials makes it vastly more practical to write and read.) That makes it easier to differentiate them, and easier to add or remove steps over time as the script evolves.

function process(array $data, string $field, callable $c) {
   $data[$field] = $c($data['response']);
}
function process_body(array $data, string $field, callable $c) {
   $data[$field] = $c($data['body']);
}

function judge_cache(ResponseInterface $request) {
   // ...
}

$client = new GuzzleClient();

// The thing we would do to each URL.
$pipeline = fn($record) => $record
   |> process(?, 'response', fn($record) => $client->get($record['url'])
   |> process(?, 'code', fn($res) => $res->getStatusCode())
   |> process(?, 'body', fn($res) => (string)$res->getBody())
   |> process_body(?, 'credit', str_contains(?, 'Company name'))
   |> process_body(?, 'badge', str_contains(?, '<img src="some_url"'));
   |> process(?, 'good_cache_headers', judge_cache(?));

// Run the pipeline.
call_api($credentials)
   |> array_map(fn($url) => ['url' => $url], ?)
   |> array_map($pipeline, ?);

It certainly does look different, but it's much clearer what's happening. First, we create a pipeline that will apply to a single URL. Then we make a list of records that contain just the URL in each record. Then we map the list through the pipeline. Each step can be an arbitrary function, including a method of any object, or of $this object. There's two very common cases, though (assign some data to a key in the array based on the response object, and assign some data to a key in the array based on the body), so we can make those simple utilities.

If for whatever reason you wanted the entire thing to live inside a class, that's not difficult:

class Auditor {

    public function __construct(protected GuzzleClient $client) {}

    protected function process(array $data, string $field, callable $c) {
        $data[$field] = $c($data['response']);
    }
    
    protected function process_body(array $data, string $field, callable $c) {
        $data[$field] = $c($data['body']);
    }

    protected function judge_cache(ResponseInterface $request) {
        // ...
    }

    protected function pipeline(array $record) {
        return $record
            |> process(?, 'response', fn($record) => $client->get($record['url'])
            |> process(?, 'code', fn($res) => $res->getStatusCode())
            |> process(?, 'body', fn($res) => (string)$res->getBody())
            |> process_body(?, 'credit', str_contains(?, 'Company name'))
            |> process_body(?, 'badge', str_contains(?, '<img src="some_url"'));
            |> process(?, 'good_cache_headers', judge_cache(?));
    }

    public function audit(array $list_of_sites) {
        $list_of_sites
            |> array_map(fn($url) => ['url' => $url], ?)
            |> array_map($this->pipeline(?), ?);
    }
}

To be clear, this is a real example from my previous job. I don't have access to the code anymore, but rest assured the non-pipes-and-partials code I used for it was vastly more complex and verbose than the example here.

HTTP Pipelines

Most modern applications have some form of multi-stage pipeline to handle HTTP requests. They may have stages like normalizing data, doing authentication, flood control, finally calling the action for a given request, and then doing some kind of formatting on the result to ensure it is a formatted HTTP response, ensuring cache headers are in place, and so on.

Today, the two most common tools for that are Symfony's HttpKernel events, or the PHP-FIG's PSR-15 middlewares. Most recent applications use one or the other. However, both are very verbose, and trying to trace through them in a debugger to see what all happens can be a chore. With PSR-15, specifically, every "step" in the process involves two different interfaces: A RequestHandler and a series of Middleware objects, with the request getting recursively called from middleware object to request handler to the next middleware to the request handler again, and so on. It works, but it's quite verbose and involves a very deep recursive call stack. Having tried to debug my way through such a stack, it can be a chore.

With pipes and partials, instead that sort of pipeline can be written like so:

$dbconn = '...';
$resolver = new ControllerResolver($dbconn);

$add_cache_headers = $container->get('CacheHeaderManager');

Request::fromGlobals()
   |> normalize_path(?)
   |> oauth_check(?)
   |> parse_body($dbconn, ?)
   |> $resolver->resolveController(?)
   |> controller_caller(?)
   |> response_handler(?, $theme_system)
   |> $add_cache_headers(?)
   |> send_response(?)
;

I'm handwaving a few dependency details for simplicity, but the core of it is, well, right there. You can see the entire application pipeline laid out together. Each step takes a request or response, and returns a request or response. If you get the order wrong, you get a type error, as you should. The call stack always remains shallow. Each step could be a function, or a method of an object, or an object that comes from the container, or an object with an __invoke() method... That's up to you. The code itself can lay out for you, very clearly, the entire structure of the application. If you want to add another step somewhere, it's trivial to see how. Just slot it in. The code itself visualizes the application.

Would that work without partial application syntax? Yes, but the code would be considerably uglier:

Request::fromGlobals()
  |> fn(ServerRequestInterface $request): ServerRequestInterface => normalize_path($request)
  |> fn(ServerRequestInterface $request): ServerRequestInterface => oauth_check($request)
  |> fn(ServerRequestInterface $request): ServerRequestInterface => parse_body($dbconn, $request)
  |> fn(ServerRequestInterface $request): ServerRequestInterface => $resolver->resolveController($request)
  |> fn(ServerRequestInterface $request): ResponseInterface => controller_caller($request)
  |> fn(ResponseInterface $request): ResponseInterface => response_handler($response, $theme_system)
  |> fn(ResponseInterface $request): ResponseInterface => $add_cache_headers($response)
  |> fn(ResponseInterface $request): ResponseInterface => send_response($response)
;

No one writes code like this in PHP today... because the language syntax makes it cumbersome to do. OOP was rare in PHP until the syntax got good enough to make OOP convenient and effective to write. Now OOP is everywhere in PHP.

Without pipes or partials, trying this approach looks like this:

$request = Request::fromGlobals();
$normalizedRequest = normalize_path($request);
$authenticatedRequest = oauth_check($normalizedRequest);
$parsedRequest = parse_body($dbconn, $authenticatedRequest);
$resolvedRequest = $resolver->resolveController($parsedRequest);
$controllerResponse = controller_caller($resolvedRequest);
If (!$response instanceof ResponseInterface) {
  $controllerResponse = response_handler($controllerResponse, $theme_system);
}
$responseWithHeaders = $add_cache_headers($controllerResponse);
send_response($responseWIthHeaders);

Readable, but fugly, with lots of extra visual flotsam that makes it harder to follow what's going on. But even that is easier to follow than the typical middleware stack today, which are effectively doing that but spread across 8 files with 16 function stack calls.

If one wanted to make the steps configuration-driven, that would be possible by building up an iterable or array of callables dynamically, and then foreach()ing over them. That wouldn't work with the |> operator, which is static, but it would still work just as well with partial application.

Now, take a moment and consider your own code. How many places do you have a function that does a thing, whose only result is to compute some value that gets used in the next block of code in the function, which itself computes some value that gets used in the next block of code in the function, and so on? Or worse, it computes some value and then calls the next function itself, which does more stuff and calls the next function, and so on, creating a deeply nested mess that doesn't show you what is actually going on, nor can you test any part of it without the lower parts? If you're lucky the lower parts have been dependency injected so you can mock them, but still, that's not the part you're interested in testing.

Pipes and Partials allow you to trivially chop that function up into logical pieces and string them together. The results can all be used independently, all be tested independently, and trivially plugged together to create your actual function.

Pipes and types

Another use of pipes, which relies less on a partials but can still use them, is method-like behavior for any value. Like, say, scalars.

The shortcomings of PHP's array_map() and array_filter() are well known, as is the eternal "needle vs haystack" question for array functions vs string functions. Trying to use more than one such function together inevitably ends up with a mess, either deeply nested function syntax or lots and lots of variables that exist for exactly one line, so that you can call the next function down.

One proposal for that has been "scalar methods," which would be a way to attach methods to scalar values (strings, ints, arrays, etc., but mostly strings and arrays). That's been floated many times, but never been brought forth as a full proposal. In part, it's just complicated. In part, it means a debate over which functions get promoted to being pseudo-methods of a scalar and which don't. strlen() almost certainly would be, but what about metaphone() or soundex()? Likely not. Would array_unique() be special enough to become a scalar method? I could argue either way. Pipes, however, would alleviate both problems.

That's because, in practice, any method is really the same thing as a function that takes an object as a first argument. In many languages, in fact, that's exactly what the syntax looks like. Pipes and partials allow you to make functions that will stitch together the value to operate on and the parameters to define the operation. That is:

function afilter(callable $c) {
  return array_filter(?, $c);
}

$arr |> afilter(fn($x) => $x %2);

For a single step that's not especially interesting. But try doing that with a bunch of simple utilities, whether built with partial application or not.

$unique_sounds = $some_string 
    |> explode(' ', ?) 
    |> amap(soundex(?)) 
    |> afilter(fn($s) => $s !== '0000')
    |> array_unique(?)
    |> array_values(?);

That compact little bit of code takes some input string, splits it, runs a soundex computation on each result, filters out unprocessable values, then gets just the unique values from the result, then reindexes the keys to create a packed array. All in one simple statement.

No, you're probably not writing it like that right now. That's because what you would have to do now is far more cumbersome, and so you aren't conditioned to think in terms of connecting steps together. Once you see the pattern, though, you begin to see it in many more places, just like any new pattern you learn.

Were you to do that today, you'd be writing something like:

foreach (explode(' ', $some_string) as $string) {
  $temp = soundex($string);
  If ($temp !== '0000') {
    $result[] = $temp;
  }
}
$result = array_values(array_unique($result));

Which certainly makes it harder to see what is conceptually happening, and what the goal is, and if there's side effects on other variables you have to be mindful of.

Should unique() be promoted to a pseudo-method on arrays? Who cares? With pipes, any function (or method of another object) can be a pseudo-method on any value, of whatever type. That's far more flexibility for far less effort.

Or is it used?

The conventional wisdom on the Internals list seems to be that the kind of code that would benefit from partial function application is not how people write PHP, so that's not worth trying to support. (A chicken and egg problem if ever there was one.) However, as I was writing this post some folks from Parsica (a parser written in PHP) took to Twitter to say their library would benefit from it greatly. For example, they have code like this in several places:

$prepend = fn($x) => fn(array $xs): array => array_merge([$x], $xs);

From the author on Twitter:

turning it around but also currying it. The parser that uses it uses the "Applicative Functor" functionality to then apply its arguments one by one, based on what it parses. This would be super easy to do if the proposal was approved, no more writing our own curry.

The health check example above is simplified from real code that I wrote for a previous job.

This type of code does exist in PHP. It's just not widespread, because the syntax gets in the way. With partial application and pipes, we can fix that.

Conclusion

It's a valid statement that the style of coding that partials and pipes enable isn't common in PHP today. That's because those features aren't available, creating a chicken-and-egg question. The problem spaces that they would help with, however, exist all around us. Having the right hammer, we can see nails we didn't realize were there before. Let's build hammers that will let us solve issues in new, better, more composable ways.

Thank you for reading this far. It is my hope that these examples have convinced you that partial function application and the pipe operator, together, more than justify the relatively low added complexity of partial application. (Pipes are trivial, and partial application is still simpler than many other very-successful features that have been adopted in the language.)