In my previous post, Turning data into code, I described the latest iteration of my template language, used to convert data-structure or AST produced by the parser into source code in the target language. At that time, I was updating my IDE to support the template language, which would allow me to test larger, more realistic examples. I’m now working on the first of these more realistic examples, a template to convert a parser definition into a PHP parser, and beginning to find little things I want to change in the language. Some of these changes will be simple, and have little or no impact on my existing tests, but one is a fairly major change to the language, that will probably affect all the existing tests.
Improving the foreach statement
The first change that I want to make is one that I spotted a while ago. One of the control structures and my language is a foreach
statement, e.g. foreach(a as b)
, that takes an array node ‘a
‘ in the data structure, and allows each member of that array to be used in turn with the identifier ‘b
‘. The array key for the current member of the array being represented by ‘b
‘ is not available inside the loop, but often it is useful to have this, particularly when the array is a dictionary. In the previous iteration of my tools I worked around this by allocating a member of the array to have the same value as the key, however this is a bit clunky and involves the parser processing code having knowledge of what will be required by the output template.
As a temporary workaround at the moment I have a magic variable (created at the interpreter level), _eachKey
, which is updated for each variable in the loop. This works adequately, but it is just a quick fix, not a good solution. The value can become ambiguous when foreach
loops are nested, and in my current example I found I had to save the value to the temporary variable so that it was available in an inner loop. A better solution will be to copy more of the PHP syntax which inspired it. In PHP, the foreach
statement has further optional syntax to pass the identifier into the loop: foreach(a as c=>b)
, so I plan to add this to my parser. Another change that I should make to make things more consistent is have the temporary variables ‘c
‘ and ‘b
‘, looking like temporary variables in my syntax. I want to stay consistent with having the AST being read only, and other variables that can be created within the template language looking different by having a ‘$
‘ at the start of their name, so c
and b
should become $c
and $b
. This does also mean that I need to increase the flexibility of these temporary variables. At the moment they can only be strings, but $c
might be an integer, and $b
could be any valid subsection of the AST – so a mix of arrays, dictionaries, strings and integers.
Restricting the scope of temporary variables
My current implementation of temporary variables is a bit simplistic – basically it’s just a dictionary. This means that all temporary variables are global in scope, which isn’t great for recursion. To fix this I’ll create a new dictionary each time a function is called, pushing the previous one to a stack. When the function ends I’ll delete it’s local dictionary, and pop the previous one from the stack. I’ll also allow read operations to search through the stack to find a matching identifier. I think this will be sufficient – I’ll rethink it a bit if I ever find that real global temporary variables are useful.
Passing variables to functions (and improving terminology)
In the previous version of my templates I described subsections of template separated from the main body ‘fragments’ – that really goes right back to an earlier system that took a fragment of code, and turned it into a function that returned that code, with a few substitutions. In the latest version I called these ‘sections’, with fragment being used (internally in function and variable names) to refer to output strings. I didn’t want to call these functions, because they are a lot less versatile than a conventional function, and there will be a number of built in more conventional functions in the language eventually. I’m now thinking that it would be simpler and clearer to refer to them as output functions, and also wondering if they even need a keyword in their declaration. Currently one thing that is missing from the declaration is parameters – they take one parameter, a reference to a node in the AST, and that is typically a dictionary, so there is usually no need to give it a name. In the previous version, I used the keyword ‘this’ to refer to it if it was a string, or if the same reference was to be passed to another function.
I like the simplicity of this system, but I also am now thinking there are situations where some further parameters, assigning temporary variables within the scope of the called function could be useful. I don’t want to start assigning a name to the referenced node, but I think a new keyword ‘data’ would be preferable to ‘this’.
I need to put some thought into the syntax for this – the aim is to make something effective for this purpose, but not too different from other programming languages, so it is easy to learn.
Source code generation literature
In my last blog I mentioned that I’d not found much literature about the source output part of generative programming tools. I’ve now found a textbook that mentions it – Domain-Specific Languages:
Effective modeling, automation, and reuse – and started reading the section on source output. One of the examples in that chapter is written in Scalla – a language I have never looked at. It seems to have some nice features for this particular problem, but it also has similarities to my solution, so I think I’m probably on the right track.