Annual Review, and thoughts about generated code

Last week I had my final “Annual Progress Review” (APR) of my PhD project, which went reasonably well. I went in with considerable trepidation because I was expecting to get a hard time over my lack of strong theoretical underpinning, and objective experimental results. These points were brought up, but not as absolute obstacles to progress, but as things are needed to address effectively. My research questions were pulled apart to some extent, but not as much as I expected. I was also expecting some questioning about the relevance of my work, given recent developments in AI, but that was barely mentioned.

There were things that I don’t think I managed to get across very well about my project. One in particular is the way I expect my parser library and tools to be used. The parser code generated by my tools is very much more readable than any other approach to parsing I have come across, at least at the runnable code level, which I see as are fairly major advantage. However, the main use of that advantage is that the library can be used without generating the parser, as a very much improved way of writing handwritten parsers for small languages that are likely to need frequent updating. When using my tools to generate a parser written in PHP, using the library, rather than a parser written in my own language, I have separated out the handwritten code from the generated code by using a visitor pattern. My own little parser development language is very simple, and I wasn’t really thinking of adding enough complexity to support the visitor pattern. In practice, the visitor pattern does change code readability, as the point where parsing matches are processed is moved to a separate file, potentially with a very different layout, from the file where they are triggered, but the individual files are smaller. I think it’s probably harder for someone who doesn’t know the code, but easier for someone who knows it well, however I don’t have evidence for that.

During the APR, I think I gave the impression that I expect generated code to be routinely modified, as I was talking about both code generation, and code readability. The view of my assessors was that this is something that should never be done. As someone who spent several years writing C++ applications with the MFC class library, I’m very well aware that modifying generated code can be absolutely normal piece of routine practice – MFC programs are almost always created by generating a skeleton, and then filling in the detail, however I understand their point.

In “The Pragmatic Programmer“¹, Hunt and Thomas categorise code generators as either passive or active. Passive generators are run once to produce a result, and from that point onwards the result is code that can be treated in the same way as handwritten code. Active code generators however will regenerate the code whenever an updated version is needed. Hunt and Thomas describe the code from active generators as throw-away, as it is entirely replaced each time there is a modification needed. They also have a separate section in the book titled “Evil Wizards” where they discuss other code generators (including the MFC application wizards.) Here they make the point very strongly, that passive code generators or wizards should create code that the programmer can understand, and indeed the programmer should only use these tools if they understand the generated code. Since I first read it over 20 years ago, the pragmatic programmer has been one of the most influential books in my thinking about software development, but it is much more a book for practitioners than for theoretical computing science researchers. I’ve just spotted that a new revised edition came out five years ago, and is currently Amazon’s no 1 bestseller in in Software Design & Development.

Following the conversation at my APR, I have decided that I really should support a form of visitor pattern within my little language, so that the final stage of my parser generator can be used as an active generator. I’ll take inspiration from the most elegant approach to the visitor pattern I know of, which makes use of C#’s partial classes, and partial members. My little languages not object-oriented, but what I will do is create a type of “partial function”, comparable to a partial method in C#, which can be declared and used as null, or have an implementation in a separate file which is used if it exists.This will mean that my code generator can be used either as a passive for an active generator, depending on the end-users requirements.

The major piece of advice from my APR was to look at the experimental methods used by other researchers in the fields more closely, and then get some of my work published published. Because my project has been strongly biased towards development rather than experimentation, having peer-reviewed publications will greatly help when it comes to submitting my thesis. I always am a bit nervous about putting my work in front of other researchers, because I am very aware that I lack the theoretical background that they have, although I do have a lot of practical experience.

[1] A. Hunt and D. Thomas, The pragmatic programmer: from journeyman to master. Reading, Mass: Addison-Wesley, 2000.

Niall B's blog

Annual Review, and thoughts about generated code

Leave a Reply Cancel reply