I’ve been trying to recreate Alex Kladov’s implementation of the Pratt parser algorithm, described in “Simple but Powerful Pratt Parsing“, in PHP, to make sure I fully understand it. Kladov’s implementation is in Rust. There are also translations to Python and TypeScript linked from his github archive. I set out trying to read the three implementations, in languages I’m not fluent in, rather than the description, while doing my own implementation. Perhaps that was a mistake, but it did get me thinking about how programming style affects readability, and how programs should be written if their purpose is to demonstrate an algorithm. Kladov’s blog provides a good, though wordy, description of the algorithm, and I expect the other two implementations were, like mine, written to help the authors fully understand it, so it is understandable that the code is not written for readability, but also frustrating.
In 1968 Edsger Dijkstra wrote a letter to the editor of Communications of the ACM, “go to statement considered harmful“[1], which has been fairly influential, and many high level languages, including Rust, Python and TypeScript, follow his advice to not have a goto
statement. Except, they sort-of do really…
A few years later, William A Wulf wrote a paper “A case against the GOTO“[2], which argued that there are some situations where goto
is valid, but these should be supported with alternative language constructs.
The three languages I mentioned above have the break
, and the continue
statements for jumping from inside loops, which are really just special cases of goto
under different names. TypeScript/JavaScript is particularly blatant, as it allows continue
to jump to a flag, i.e. it is just goto
renamed, and slightly constrained.
On the whole I’m more in agreement with Dijkstra than Wulf: I think both break
(other than in case
statements) and continue
tend to lead to hard to read code, and are almost never needed. Modern editors make code with nested if
statements much more readable, and the condition of an if
gives a statement of intent that is missing with the pseudo goto
s. Another use of break
is to avoid multiple return
s – there are times when that’s a good idea, but, in my opinion, there are a lot more occasions when returning as soon as the routine is finished leads to much clearer code. Anthony Steele’s blog post “The Single Return Law” discusses this.
Of course, other programmers may well disagree, and if the style they’re used to involves using break
and continue
to perform goto
-like jumps to avoid a couple of extra if
s, and return
s, maybe they find that more readable.
The three implementations of Pratt’s parser algorithm are also free of comments, though it could be argued that Kladov’s detailed description suffices.
The use of lots of formatted text describing adjacent pieces of code has some resemblance to Donald Knuth’s Literate Programming, an example of which is in one of Jon Bentley’s programming pearls, “A literate Program”[3]. (Note, in the PDF of this there is a translation of := (assignment equals) to ←, which makes the code hard to read for those familiar with Pascal.) However, Knuth’s literate programming created a document that could be processed to create TeX, and hence PDFs for printing, or source code, while this is effectively two separate documents with significant duplication.
This blog post is a work in progress, more to come soon, looking at McConnell’s[4] approach (which I prefer), and the choice of language where the principle purpose is to share/demonstrate an algorithm.
- Dijkstra, Edsger W. “Letters to the editor: go to statement considered harmful.” Communications of the ACM 11.3 (1968): 147-148. ↩︎
- Wulf, William A. “A case against the GOTO.” Proceedings of the ACM annual conference-Volume 2. 1972. ↩︎
- Bentley, Jon, Don Knuth, and Doug McIlroy. “Programming pearls: A literate program.” Communications of the ACM 29.6 (1986): 471-483 ↩︎
- McConnell, Steve. Code complete. Pearson Education, 2004. ↩︎