My interest in generative programming and little languages (PhD)

I first started creating simple tools to generate repetitive bits of source code when I was developing a large (by my standards) Java application which needed a lot of classes that would implement the same interface. The tool let me create stub class definitions from a list of class names, and also long lists of constant definitions (public static final int,) which are a real pain to type in Java.

Later I start developing more advanced tools, starting with a tool that created WSDL definitions for SOAP interfaces and also PHP implementations of these definitions. This was a lot neater and more convenient than just typing in the WSDL, as it used a more conventional syntax for defining the APIs.

Gradually my generative programming tools got more sophisticated, and I started developing tools to develop generative programming tools. At this point it began to look like this was potentially a PhD project, and that is what it has become now. I’m two nominal years into the PhD, although doing it part-time so it is a six year process, and spread over rather more than six years because of a mixture of uncooperative managers and illness.

My approach to generative programming is to create a simple, concise domain specific language (DSL) which is very good at doing one thing, and then use it to write the less concise code in a general-purpose language to do that job. Each tool consists of a parser for the DSL, which builds a data structure that is then passed to a template language to generate the output. My intention is that the tool for these programming tools should be easy to use, so creating little DSLs to help with efficient generation of slightly repetitive source code becomes something that any programmer can take on.

My early tools depended on very complicated regular expressions to do both lexing and parsing, which worked well for the simplest DSLs is, but quickly became unwieldy, and unmaintainable, as the DSL became more complicated. The current iteration separates the lexer and parser, and indeed separates the parser into two stages which could roughly be described as parsing statements and parsing structure. Although this sounds more complicated than the single-stage passing of a tool like ANTLR, my view is that in practice, by dividing the process into clear stages, it actually simplifies it.

Over the next couple of years, as I finish off my PhD, and get my tools into a state for other people to use I will be blogging a bit more about it, and also updating this page which is intended to give a background and context to these blog posts.