
(For some background/context to this post see my Generative Programming page.)
When I first started writing parsers for little source code generation languages, I largely ignored expressions because the simple declarative languages that I was creating had no need for expressions, that is a section of code that is a combination of values and operators that resolve to a single value. However, when I went on to think about slightly wider applications of little domain specific languages, I realised that my tools should really support expressions in a way that would be easy for non-specialist developers to add them to their languages. Conventionally, expressions are defined as part of the eBNF grammar used to generate a parser, however this can be a bit tricky, so, as my tools are intended to be used by programmers who do not specialise in writing language I wanted to create a simple alternative.
Some 30 years ago, as a learning exercise I started trying to create a BASIC interpreter. I didn’t get very far, but one thing that I did do was create a simple expression interpreter, and it was this code I went back to for inspiration. When I wrote the original code, I had no knowledge of any established parser algorithms – I was a self-taught programmer, trying to work things out for my self. I did have one rather useful skill however: for many years I’d been a user of Hewlett Packard calculators, with their unusual reverse-polish notation. This meant I was used to converting arithmetic expressions into a more computer friendly format in my head before typing into the calculator, and it was working by working out out this process properly that I created my algorithm.
The most basic form of the algorithm, in pseudo-code is:
function parse(tokenList)
stack operatorStack
stack rpStack
var term = tokenList.readNext()
rpStack.push(term)
while(not end of tokenList)
var op = tokenList.readNext()
while(operatorStack.peek().precedence > op.Precedence)
rpStack.push(operatorStack.pop())
operatorStack.push(op)
term = tokenList.readNext()
// Tidy up
while(operatorStack.peek())
rpStack.push(operatorStack.pop())
return rpStack
Parenthesis can be supported using recursion, and functions can be supported by replacing the lines term = tokenList.readNext()
(lines 4 and 11) with a call to a more advanced matching rule.
By reusing this old idea I was able to create a simple syntax for a special expression rule in my language definitions. Expressions rules start with the indicator @expression, which is followed by a list of rules for acceptable terms, followed by groups of operators with ascending precedence.
constterm = INT | FLOAT | DQSTRING
expr = @expression (IDENT|constterm) ('+', '-'), ('*', '/')
An extra modifier can be used to indicate that a precedence group has right associativity – left associativity is the default. As my tools are not intended for really esoteric languages, I have made parenthesis implicit. This means that the language writer doesn’t need to add any extra information to support sub-expressions in parenthesis. As long as the lexer recognises ‘(‘ and ‘)’ as tokens, the generated parser will support them in expressions.
Of course, this parser algorithm is not an original idea. A few weeks ago I came across a blog post by Robert Nystrom “Pratt parsers: Expression Parsing Made Easy“, which describes something very similar. That algorithm was first described by Pratt [1] in the early seventies, some 20 years before I came up with my version. Another very similar algorithm, the shunting yard algorithm, was described by Dijkstra [2] in 1961.
I found the reference to Dijkstra’s version in a couple of blog posts by Alex Kladov (matklad) “Simple but Powerful Pratt Parsing“, and “From Pratt to Dijkstra“. He includes Rust code for both versions, and his git repository also links to Python and TypeScript translations. Unfortunately I’ve had no luck getting these to run…
[1] V. R. Pratt, ‘Top down operator precedence’, in Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages – POPL ’73, Boston, Massachusetts: ACM Press, 1973, pp. 41–51. doi: 10.1145/512927.512931.
[2] Dijkstra, Edsger (1961-11-01). “Algol 60 translation : An Algol 60 translator for the X1 and making a translator for Algol 60”. Stichting Mathematisch Centrum.
Thanks Niall this is a nice summary of your research progress in the past few weeks.