I Don't Understand PHP

Hating on PHP is trendy.

But the language genuinely does have some fantastic bits! It didn’t become, and hasn’t remained, as popular as it is for no reason; I wish more people would take PHP a little more seriously. The “clean” request model — every request starts “clean”, no global state, nothing available except what you explicitly load from a cache or database or whatever (and nothing saved at the end that you don’t explicitly persist) — combined with the fast edit-refresh development cycle is really powerful. Other languages have variations of this, but I haven’t seen any other one that nails it quite as well as PHP does. It’s an incredible development model and developer experience and I really think people underestimate its elegance and importance. (My link above elaborates on all of this in much more detail.)

That said… there’s also a reason hating on PHP is trendy; it didn’t become, and hasn’t remained, such a derided language for no reason. Much of the language is not just bad, but fractally bad — looking closer, and attempting to understand why it’s bad, only uncovers more badness. I contrast this with something like Perl: I don’t like Perl, but the language has a perverse consistency to it; I disagree with nearly all of their design decisions but at least they sat down and made design decisions with which I can disagree!

Plenty of people have gone on long philosophical rants about PHP or tried to enumerate all the ways that it’s broken. (I linked a great one above.) I don’t have much to add in that department. Instead, I want to tell a story, taking a close look at a specific example of PHP badness and diving deep enough to show that the badness just keeps coming no matter how deep you go. It’s my favourite example of PHP’s fractal badness — but keep in mind it’s far from unique; PHP has issues like this everywhere you look.

Hey, I can be trendy too!

List Destructuring

The issue has to do with PHP’s “list” construct, which is like a simple version of pattern matching — it reads elements out of an array. It works like this:

<?php

$x = array('hello', 'world');
list($a, $b) = $x;
var_dump($a, $b);

which outputs

string(5) "hello"
string(5) "world"

OK, fine. Makes sense. So far so good.

What happens if you try to use “list” with something that isn’t an array? In particular, what happens if you try to use it with a string? Consider what I will refer to as example A:

<?php

list($a, $b) = 'hello';
var_dump($a, $b);

Running that outputs

NULL
NULL

OK, so it doesn’t work on strings. Now, quick quiz: given what we just saw, what does the following output? (With PHP-inspired consistency, I’ll refer to this as example 2.)

<?php

$x = 'hello';
list($a, $b) = $x;
var_dump($a, $b);

It should print the same thing, right? All we’ve done is introduce a variable in the middle? Surprise!

string(1) "h"
string(1) "e"

It’s not the same. Introducing a variable in the middle changed the behaviour! That’s obviously bad; it’s inconsistent and surprising. But, like so many things with PHP, the more you dig deep, the more you think about it, it just gets worse; it’s fractal. In particular, how must the PHP runtime work such that these two can be treated so differently? It’s not just inconsistent, it’s inconsistent in such a weird way that belies deep structural problems in the runtime’s operation and quality.

People familiar with PHP are probably accusing me of cheating right now: the above outputs were generated with PHP 5, and this particular inconsistency was fixed in the transition from PHP 5 to PHP 7. (Fractal sidenote: what happened to PHP 6? It was an attempt to revamp the language and runtime around Unicode or something like that, and it failed so utterly that the entire thing, including the major version number itself, was burnt to the ground and never spoken of again.)

Let’s look at what happened in PHP 7.

Self-Similar Mistakes

At first glance, there are two ways to fix this inconsistency. Using “list” on strings could be defined to be invalid in all cases (thus changing example 2’s output to match that of example A) or it could be defined to do character extraction (thus changing example A’s output to match that of example 2). PHP 7 chose to do the former: both examples result in nulls when run with PHP 7.

It’s consistent now! Problem solved, right? Well, things aren’t as simple as they might seem at first glance. Consider one last example. (Which, with continued inspiration from PHP’s inconsistent inconsistency, I’ll call example 3.)

<?php

$x = 'hello';
$a = $x[0];
$b = $x[1];
var_dump($a, $b);

This example uses explicit array subscripts, instead of the “list” construct, to extract elements. It basically just rephrases the question that started this journey: what should array indexing into a string do? It should do the same thing, right? Using “list” should just be syntactic sugar for array subscripts, right? So example A, example 2, and example 3 should all result in the same output? Of course not:

string(1) "h"
string(1) "e"

So yes, the inconsistency between example A and example 2 was indeed fixed in PHP 7. But it was fixed in the wrong direction, thus creating a new, different inconsistency! Example A, example 2, and example 3 should all mean the same thing; in PHP 5, example 2 was different, but instead of changing the behaviour of example 2, they changed the behaviour of example A! It’s fractal badness: it was broken, there was a fix, but if you look closely at the fix you find the same problem over again.

So why was it fixed that way anyway?

RFC means Request For inConsistency

Changes to the PHP core language go through an RFC process, where a proposal is written up, discussed on the PHP internals mailing list, and finally voted upon by members of the core team. All of this happens in the open, so we can go and look at the RFC for this change and see what happened.

The vote was actually a three-way vote, between the two different ways of fixing the inconsistency and not fixing it at all. (Thankfully “do nothing” did not garner any votes. Silver lining I guess.) So both ways were debated at the time, but instead of considering that one option might actually be wrong, it was put to a vote.

A vote which went the wrong way… by one vote. One vote! Sigh.

(Fractal sidenote: PHP’s RFC rules state that “language changes” require a two-thirds majority. While it’s not entirely clear what is a “language change” and what is not, this change clearly qualifies. In this case, while “make a change” has a two-thirds majority over “don’t make a change”, neither specific change hit that threshold. It's unclear if setting up the vote this way actually follows the RFC rules, but no one complained so I guess that’s irrelevant. The rules were eventually fixed to clarify all of this.)

The mailing list thread about the vote provides some insight into why some people voted that way. This is the point where I just make a sad sigh and shake my head. I’ll call PHP bad, fractally bad even, but I absolutely will not say the same of the actual people behind it. When I read that thread, I see well-meaning people trying hard to do the right thing — something I respect. But, sadly, that doesn’t always translate into actually considering the right problems and making the right decisions, as is the case here.

There’s a lot of orthogonal discussion in that thread, about the runtime implementation, about other types of input to “list”, etc., but the main argument for fixing the problem the other way boils down to the idea that strings aren’t arrays of characters, and that the language doesn’t treat strings as arrays of characters in most other places, such as all of the array library functions. Therefore, having “list” reject strings is consistent with that decision. Which is true! But what this misses, the key point that was never brought up in the mailing list discussion, is that it’s inconsistent no matter which way you decide to fix this, just in different places — and those places are not equally bad! Since “list” and array indexing fulfill really similar functions in the language, it would be really nice to be able to think of “list” as just a fancy way of doing array indexing; it’s much more important to keep those two consistent with each other than to keep “list” consistent with the array library functions, which fulfil quite a different function in the language. But by not doing that, “list” is yet another unique language construct that people will have to understand — when it could have been just some simple, easy-to-understand syntactic sugar.

That is a really good point raised though — why does the language treat strings as arrays of characters when doing direct indexing, but not elsewhere? Even if PHP made the opposite decision about “list” here, that would be locally more consistent… but the real problem underlying all of this is a wider inconsistency in string handling.

And here again we see the fractal nature of PHP’s problems: after all our digging on one issue, the result is only another inconsistency in the language not unlike the one we started with.

Inconclusion

I don’t understand PHP. I’m not sure it’s possible to. The more I try to dig into what’s going on, the more questions I end up with. Despite having some fantastic elements, that I wish more people would take seriously… it simultaneously deserves all the derision that it gets.

I do think that there’s a wider lesson here, maybe? That it’s worth digging into bugs in complex systems and fully understanding them, so that you can make sure to fix them the right way? That when doing so, you may uncover a deep but non-obvious structural flaw as the root cause, of which the bug is only a minor symptom — and fixing that might be the right way to fix the entire class of bugs? (Or at least if you don’t fix that structural flaw, at least you’re aware of it and can work around it as best as possible?) That such structural flaws can coexist in systems with major redeeming values too, and so “throw it all out and start over” might also not be the answer?

Something like that. I dunno. PHP just makes my head hurt.