Testing DSLs and Code Generators

In my last post I said that I would leave discussion of the testing section of Wąsowski and Berger’s[1] chapter on code generation to a separate blog post, Basically because it is very tricky subject and deserves quite a bit of discussion.

The section starts off with the statement “Testing a domain implementation is easier than testing an interpreter, which is easier than testing a code generator.” This fits with my experience more or less, although testing a parser for a dsl written in an earlier interpreted version the same dsl, using that same dsl’s definition as the test code, as I have tried in the past, takes the challenges of testing to a completely different level. Once that stage has been covered testing the code generator seems relatively simple.

As Wąsowski and Berger point out, the fundamental issue is that bugs become evident in the output code, but their causes lie in the input or the code generation. A conventional debugger is very little help in this situation. They go onto describe the challenges in creating test oracles – basically alternative ways of getting the correct result – for code generators, and the limitations of a ‘comparison with baseline approach’ – basically testing the program output against a well checked example.

In practise what I have been doing up till now is very close to that comparison with baseline, using a test driven development (TDD) approach. Wąsowski and Berger say that this strategy only works for very simple languages, which is true, but the majority of my languages are simple. In practise I think that most generative programming should be done with very simple languages because it is about improving the process for creating near boilerplate code that is verbose in a traditional language but could be concise in a language designed just for one problem.

Wąsowski and Berger Go on to describe three levels in what they perceive as a more systematic approach to testing. The tree levels are robustness, structural correctness and semantic correctness. Their solution can be summarised as “test with lots of different inputs, and check that the output is always valid in the target language.” Unfortunately this does not address the specific issue of finding bugs that manifest in output, but are created elsewhere.

The diagram above shows the modules where a problem could occur in a code generation DSL. The green sections are very much a challenge for me at the moment as I create the tools, but should be fairly reliable by the time end users are using them. The yellow parts are the locations where an end user bug could occur, and the red parts are where it could be manifested.

Testing is a challenging area of software development, and, within the context of the types of projects I am familiar with, not an area where there are any genuine good solutions. Unit testing came into fashion about 25 years ago, and is probably the most widely used testing approach in these projects. It is a great tool for helping developers procrastinate and get instant gratification¹ without meaningful achievement, but also is a useful tool for coping with maintenance of poorly designed software with excessive coupling², or situations where there are unreliable dependencies. In my experience, at least within the context of the small development teams I have knowledge of, unit testing is either largely performative, or a way of avoiding dealing with poor design.

At a more productive level, some of the ideas from unit testing are useful for test driven development, where a piece of code is genuinely complex, rather than poorly designed. That is something I use regularly when designing my DSL library, and by using reference input and output implementations, for creating DSLs and code generators. However, this does not address the underlying problem of finding bugs that may be in a quite separate project from their manifestation.

I don’t yet have a good solution for testing, but I think I have a glimmer of an idea. In Code Complete[2] McConnell talks about ‘Scaffolding’ – code designed to test a single routine in isolation. In some ways this is a sort of predecessor of Unit Testing, but with much greater flexibility. I’m wondering about writing a combination of annotation in the code and scaffolding outside the main code, that can allow sections to be tested in isolation. Maybe there could also be some way of having a lens on the code that only showed the sections used in a specific simple example that exercises a manageable part of the tool.

This still needs a lot of thought, and I need to get the main tools, the parser interpreter and the template interpreter into a more complete form before I start experimenting. However, now I’ve had the idea, I’ll keep it in mind as I write the interpreters, so that it is not too hard to implement later.

For the next few weeks I think I should concentrate on getting these two interpreters into a usable form, so my next few posts are likely to be about that.

[1] A. Wąsowski and T. Berger, Domain-Specific Languages: Effective Modeling, Automation, and Reuse, 1st ed. 2023. Cham: Springer International Publishing, 2023. doi: 10.1007/978-3-031-23669-3.
[2] S. McConnell, Code complete: a practical handbook of software construction, Nachdr. Redmond, Wash: Microsoft Press, 2001.

Many years ago I was at a multi-project meeting, and spoke to one of the developers who’d taken over a project initiated my one of my colleagues. The new team had decided my colleague’s code had hit a block due to poor architecture, and decided to rewrite from scratch. A year into the project things weren’t going too well – they almost had got back to the functionality of the original code, but were finding progress slow, and had just started adding unit testing. A year latter I spoke to the developer again – this time the project was going fantastically, they had hundreds of tests, which all passed. Unfortunately the functionality hadn’t moved on at all! ↩︎
The most common unit test, Assert(true);, does not help in any situation. ↩︎

Niall B's blog

Testing DSLs and Code Generators

1 thought on “Testing DSLs and Code Generators”

Leave a Reply Cancel reply