the potential source of errors. This is the idea the
Roslyn
1
compiler went with, and it seems to work
well in practice: most tooling in the .NET environ-
ment can be written as so-called analyzer packages,
that use the Roslyn APIs. This approach however re-
quires a different architecture to stay reasonably ef-
ficient and achieve the response times that are ex-
pected from these tools. A possible solution for this
is the query-based compiler architecture, which splits
up the pipeline into smaller, independent operations
called queries. It is not too big of a divergence from
the pipeline architecture, but will allow for optimiza-
tions that greatly helps responsiveness.
In Section 2 we discuss the idea behind query-
based compilers and the optimizations they can offer.
We also talk about languages with semantics that nat-
urally require a different architecture, hinting towards
the query-based approach. In Section 3 we describe
the framework we have developed along with its main
goals. While it is not the focus of our contribution,
we present an experimental programming language
used to showcase the capabilities of the framework.
We believe that the main value in the programming
language is the connection between its semantics and
its architecture and how that connects to query-based
compilers, so - due to it also being out-of-scope for
this paper - we will not give a formal description of
the language itself. In order to help better understand
the framework, Section 4 contains an illustrative ex-
ample.
2 BACKGROUND
Language analysis tools that work in the background
of the IDE need to have reasonably fast response
times. When the developer edits the code, the tool
needs to immediately show the errors or suggest the
best possible auto completion options for example. If
we simply opened up the classic, pipeline-based ar-
chitecture, exposing all functionality as-is, the result-
ing process would be highly inefficient. The reason
behind this is that the entire process of code analy-
sis would have to run for the entire codebase each
time the user alters the code, which is potentially
very expensive. This could cause delays that are usu-
ally unacceptable for such a tool. A new architecture
is needed that is not only incremental in nature, but
would also only do the minimal required computa-
tion for the given analysis, hinting towards a demand-
driven system. This is where query-based compilers
can aid us.
1
https://github.com/dotnet/roslyn
2.1 Query-based Compilers
The idea of query-based compilers has gained some
popularity recently as major compilers - most no-
tably the rustc compiler (Klabnik and Nichols, 2019)
- have started to turn towards this architecture. Gen-
eral incremental computation frameworks like Rock
(Fredriksson, 2020) and Salsa (Matsakis, 2019) ex-
tracted the principle into libraries, helping the under-
standing and development of such systems. The basic
architectural transformation is quite simple: instead
of the pipeline elements that transform the input in
large batches, we define smaller declarative opera-
tions that work on individual entities in the compiler.
These smaller operations are called queries (Fredriks-
son, 2020) (Matsakis, 2019).
For example, symbol resolution might be imple-
mented as a single pass on the AST in a classic com-
piler, but in a query-based compiler we would have
various queries defined such as:
• attaching a declared symbol to an AST node
• retrieving the declared symbol of an AST node
• asking for all available symbols in a given context
While a compiler working with the classic
pipeline architecture also has to implement these op-
erations in some form, the key is that these queries
should be implemented in such a way that they can
be invoked without assuming that any previous passes
have been executed. A query will always assume that
no work has been done before and starts with invok-
ing all other queries that are required to do its own
work. Figure 2 shows a possible tree of queries in-
voked when type checking a simple C statement. The
query starts from the operation we want to perform,
and invokes all computation to reproduce results, up
until the source code is requested, which is consid-
ered as a given input for the system. While this might
seem inefficient at first, it is a very important step to
make the compiler into a demand-driven system. In
the next section we will explain how redundant com-
putations can be eliminated from the system, solving
this inefficiency concern.
Solving the inefficiency concerns, this could mean
that the compiler and the IDE tools would be able to
share the exact same code, reducing the complexity,
codebase sizes and the possibilities for errors. Despite
this promising design, there are still very few systems
that have been developed to support this architecture
and idea (Fredriksson, 2020) (Matsakis, 2019).
2.2 Memoization
Memoization is an optimization technique that caches
the results of expensive computations and looks up
ICSOFT 2022 - 17th International Conference on Software Technologies
168