Rethinking my interpreter design

In my generative programming tool-kit there are two interpreters, one which interprets parser definitions to create parsers that convert code in a simple Domain Specific Language into data structures, and one which interprets a template language to merge these data structures with templates to create the generated source code. Up until now these have been two separate programs with almost no common code, but really they should contain the same core code: the languages have different specialist purposes that make their overall structure and novel features very different, but the basics like expressions and functions are the same, and both use the same data-structure with the same naming convention.

A couple of weeks ago I was back looking at my parser interpreter, trying to restart Progress on it, and I realised that I was using a design optimised for quick development rather than clean separation of separate sections. the basic structure of project is a parser for the parser definition language, generated in PHP by my tools, which pass is recognised tokens too a separate, largely handwritten, builder class which generates the output data structure, basically something close to an Abstract Syntax Tree. to speed development I had included classes from the interpreter Into the builder, and was populating them there before serialising. I now think that this it’s too close a relationship between these classes. Certainly, the data structures have to be the same, but that is rather different from using entire classes. I want the tools to be entirely separate, so that I can switch to using a different language, without having to rewrite everything at the same time, and I’d like the data structure that has passed between the two tools to be, as far as possible, independent of the versions of the two tools.

In my music sequencer project there is a similar challenge, because I want the core of the sequencer, basically the performance part, to be completely separate from the editing interface. In that project, what I am doing is creating simple data-only serialisation classes, which are used to pass the data over web services between the sequencer core, running on a Raspberry Pi (with no display), and an other computer running the editing interface. In that project, the serialisation classes also have very short variable names, which in normal circumstances would be rather bad style, but in this instance are to avoid the variable names being the majority of the data. The classes using that data on each side of the web link will be very different, and in that project the serialisation objects are just used for communication and discarded once the data is copied into the main class.

    class simpleSequenceJSON
    {
        public double lengthTicks { get; set; }
        public string preferredEditor { get; set; }
        public List<noteEventJSON> notes { get; set; }
    }

    class noteEventJSON
    {
        public double t { get; set; }
        public int n { get; set; }
        public int v { get; set; }
        public int l { get; set; }
    }

A couple of data transfer classes from my sequencer project.

In my parser tool-kit what I will do is use similar data only classes, but rather than copying the data from them into fully functioning classes in the interpreter I will pass them as references, using them almost in a line by line approach, like an old school BASIC interpreter or ipython. I hope this will allow me too make a lot of the interpreter semi-language agnostic, allowing for greater code reuse, and also minimise the impact of any minor syntax changes in the language on the interpreter.

Niall B's blog

Rethinking my interpreter design

Leave a Reply Cancel reply