Notes On The Design Of The Mercury Compiler

This file contains an overview of the design of the compiler.

For an overview of how the compiler fits into the rest of the Mercury system (runtime, standard library, mdbcomp library, and so on), see overall_design.html.

Outline

The main job of the compiler is to translate Mercury into a target language, which can be C, Java or C#, although it can also translate Mercury to Mercury bytecode, for a once-planned but never implemented bytecode interpreter, and once upon a time it could also translate a subset of it to Aditi and to .NET's IL.

The top-level of the compiler is in the file mercury_compiler.m. This forwards all of the work to the file mercury_compiler_main.m, which is a submodule of the top_level.m package. The basic design is that compilation is broken into the following stages:

Note that in reality the separation is not quite as simple as that. Although parsing is listed as step 1 and semantic analysis is listed as step 2, the last stage of parsing actually includes some semantic checks. And although optimization is listed as steps 3 and 5, it also occurs in steps 2, 4, and 6. For example, elimination of assignments to dead variables is done in mode analysis; middle-recursion optimization and the use of static constants for ground terms is done during code generation; and a few low-level optimizations are done in llds_out.m as we are spitting out the C code.

In addition, the compiler is actually a multi-targeted compiler with several different back-ends.

mercury_compile.m itself supervises the parsing (step 1), but it subcontracts the supervision of the later steps to other modules. Semantic analysis (step 2) is looked after by mercury_compile_front_end.m; high level transformations (step 3) by mercury_compile_middle_passes.m; and code generation, optimization and output (steps 4, 5 and 6) by mercury_compile_llds_backend.m and mercury_compile_mlds_backend.m for the LLDS and MLDS backends respectively.

The modules in the compiler are structured by being grouped into "packages". A "package" is just a meta-module, i.e. a module that contains other modules as submodules. (The submodules are almost always stored in separate files, which are named only for their final module name.) We have a package for the top-level, a package for each main pass, and finally there are also some packages for library modules that are used by more than one pass.

Taking all this into account, the package structure looks like this:

In addition to the packages mentioned above, there are also packages for the build system: make.m contains the support for the `--make' option, and recompilation.m contains the support for the `--smart-recompilation' option.

Option handling

Option handling is part of the libs.m package.

The command-line options are defined in the module options.m. mercury_compile.m calls library/getopt_io.m, passing the predicates defined in options.m as arguments, to parse them. It then invokes handle_options.m (and indirectly, op_mode.m and compute_grade.m) to postprocess the option set. The results are represented using the type globals, defined in globals.m. The globals structure is available in the HLDS representation, but it is passed around as a separate argument both before the HLDS is built and after it is no longer needed.

After we have processed the options, the rest of the action is co-ordinated either from mercury_compile.m or from make.m, depending on whether the `--make' option is in effect.

The rest of this document contains a brief introduction each module of the compiler. If you want more information about either the purpose or the design of a particular module, see the documentation at the start of that module's source code.

Build system

Support for `--make' is in the make.m package, which contains the following modules:

The build process also invokes routines in compile_target_code.m, which is part of the backend_libs.m package (see below).

Front end

1. Parsing

The parse_tree.m package

The first part of parsing is in the parse_tree.m package, which contains the modules listed below (except for the library/*.m modules, which are in the standard library). This part produces the parse_tree.m data structure, which is intended to match up as closely as possible with the source code, so that it is suitable for tasks such as pretty-printing.

We transform the results of lexical analysis (done by library/lexer.m) into the Mercury parse tree in two stages:

(There is also a parsing stage 3, which transforms the parse tree into the HLDS, but that is not part of the parse_tree.m package itself.)

The parse tree data structure definition is in prog_data.m, prog_data_event.m, prog_data_foreign.m, prog_data_pragma.m, prog_data_used_modules.m, prog_item.m and file_kind.m, while the code to create it is in the following modules.

There are several modules whose collective job it is to print (parts of) the parse tree.

There are several modules that provide utility predicates that operate on (parts of) the parse tree.

The second main use of the parse tree, because transformation into HLDS, is the generation and consumption of interface files. This is done by the following modules.

We do most, though not all, forms of module qualification on programs when they are represented as parse trees. This mans that we add module qualifiers to all types, insts and modes, check that every referred-to type, inst and mode actually exists, and that there is only possible match. This is done here because it must be done before the `.int' and `.int2' interface files are written. This code also checks whether imports are really needed in the interface.

This work is done by the modules of the module_qual.m package.

As to what is module qualified when:

The parse tree module also contains equiv_type.m, which does expansion of equivalence types, and of `with_type` and `with_inst` annotations on predicate and function type and mode declarations.

Expansion of equivalence types is really part of type-checking, but is done on the parse tree rather than on the HLDS because it turned out to be much easier to implement that way. (Though later we had to add a pass the expand equivalence types in the HLDS anyway.)

The hlds.m package

Once the stages listed above are complete, we then convert from the parse_tree data structure to another data structure which no longer attempts to maintain a one-to-one correspondence with the source code. This data structure is called the High Level Data Structure (HLDS), and it is defined in the hlds.m package.

The last stage of parsing is this conversion to HLDS, which is done mostly by the following submodules of the make_hlds module in the hlds package.

Fact table pragmas are handled by fact_table.m (which is part of the ll_backend.m package). That module also reads the facts from the declared file and compiles them into a separate C file used by the foreign_proc body of the relevant predicate.

The HLDS data structure itself is spread over the following modules:

Some modules implement a pass that decides the representation of every type used by the module being compiled:

The module hlds_out.m contains predicates to dump the HLDS to a file. These predicates print all the information the compiler has about each part of the HLDS. The module hlds_desc.m, by contrast contains predicates that describe some parts of the HLDS (e.g. goals) with brief strings, suitable for use in progress messages used for debugging.

The module hlds_defns.m contains code to print the set of definitions in the HLDS to a file. When dividing a module into two or more submodules, one can use the information thus generated to check whether the new modules include every type, inst, mode, predicate and function definition in the original module exactly once. (The other sorts of definitions, e.g. typeclass definitions, are typically so few in number that one can keep track of them in one's head.)

The hlds.m package also contains some utility modules that contain various library routines which are used by other modules that manipulate the HLDS:

2. Semantic analysis and error checking

This is the check_hlds.m package, with support from the mode_robdd.m package for constraint based mode analysis.

Any pass which can report errors or warnings must be part of this stage, so that the compiler does the right thing for options such as `--halt-at-warn' (which turns warnings into errors) and `--error-check-only' (which makes the compiler only compile up to this stage).

Middle end

3. High-level transformations

This is the transform_hlds.m package.

The results of these program analyses are written out to `.trans_opt' files by intermod.m. intermod.m is also responsible for creating `.opt' files. Besides containing some analysis results, `.opt' files may also contain contains clauses for predicates (exported or local), if these clauses are suitable for other optimizations such as inlining or higher-order specialization.

Most of the remaining HLDS-to-HLDS transformations are optimizations:

The module transform.m contains stuff that is supposed to be useful for high-level optimizations (but which is not yet used).

The last three HLDS-to-HLDS transformations implement term size profiling (size_prof.m and complexity.m) and deep profiling (deep_profiling.m, in the ll_backend.m package). Both passes insert into procedure bodies, among other things, calls to procedures (some of which are impure) that record profiling information.

3-6 The LLDS, MLDS and ELDS backends

a. LLDS backend

This is the ll_backend.m package.

3a. LLDS-specific HLDS -> HLDS transformations

Before LLDS code generation, there are a few more passes which annotate the HLDS with information used for LLDS code generation, or perform LLDS-specific transformations on the HLDS:

4a. LLDS code generation

Code generation converts HLDS into LLDS. For the LLDS back-end, this is also the point at which we insert code to handle debugging and trailing, and to do heap reclamation on failure.

The code generator also calls middle_rec.m to do middle recursion optimization, which is implemented during code generation.

The code generation modules above make use of many service modules.

The purpose of export.m is to allow non-Mercury-generated C code to call Mercury-generated C code. For each `pragma export' declaration, it generates C code fragments which declare and define the C functions which are the interface stubs for procedures exported to C.

The generation of constants for RTTI data structures could also be considered a part of code generation, but for the LLDS back-end this is currently done as part of the output phase (see below).

The result of code generation is the Low Level Data Structure (llds.m), which may also contains some data structures whose types are defined in rtti.m. The code for each procedure is generated as a tree of code fragments which is then flattened.

5a. LLDS transformations

Most of the various LLDS-to-LLDS optimizations are invoked from optimize.m. They are: In addition, stdlabel.m performs standardization of labels. This is not an optimization itself, but it allows other optimizations to be evaluated more easily.

The module opt_debug.m contains utility routines used for debugging these LLDS-to-LLDS optimizations.

Several of these optimizations (frameopt and use_local_vars) also use livemap.m, a module that finds the set of locations live at each label.

Some of the low-level optimization passes use basic_block.m, which defines predicates for converting sequences of instructions to basic block format and back, as well as opt_util.m, which contains miscellaneous predicates for LLDS-to-LLDS optimization.

Use_local_vars numbering also introduces references to temporary variables in extended basic blocks in the LLDS representation of the C code. The transformation to insert the block scopes and declare the temporary variables is performed by wrap_blocks.m.

Depending on which optimization flags are enabled, optimize.m may invoke many of these passes multiple times.

6a. LLDS output

b. MLDS BACK-END

This is the ml_backend.m package.

The original LLDS code generator generates very low-level code, since the LLDS was designed to map easily to RISC architectures. The MLDS backend generates much higher-level code, suitable for generating not just C, but also Java and C# (and once upon a time, the intermediate language or IL of the .NET platform). This back-end uses the Medium Level Data Structure (mlds.m) as its intermediate representation. While the LLDS is operates on the level of assembly languages (a second generation language, with the first generation being machine code), the MLDS operates on the level of a standard third generation imperative programming language.

3b. MLDS-specific HLDS -> HLDS transformations

Before code generation, there is a pass which annotates the HLDS with information used for code generation:

When creating the MLDS back-end, we have tried to put into practice the lessons we have learned from writing the LLDS backend. One of these is that we prefer to do things as HLDS to HLDS transformations where possible, since this is much easier to debug initially and to modify later than the alternative approach of doing things inside the HLDS to MLDS code generator. Thus we have these two passes that use HLDS transformations to achieve effects for the MLDS backend that the LLDS backend achieves using code fragments that are scattered all over the LLDS code generator.

4b. MLDS code generation

The MLDS code generator also uses several auxiliary modules that do not themselves generate code.

5b. MLDS transformations

6b. MLDS output

There are currently three target languages in which we can output MLDS code: C, Java, and C#. Generated C code is intended to be given to a C compiler to turn the .c file into a .o or .pic_o file. Generated Java code is intended to be given to a Java compiler (normally javac) to turn the .java file into a .class file containing Java bytecodes. Generated C# code is intended to be given to a C# compiler to turn the .cs file into a .dll or .exe.

In more detail:

The mlds_to_target_util.m module contains types, functions and predicates that are needed by more than one of these MLDS backends.

c. Bytecode backend

This is the bytecode_backend.m package.

The Mercury compiler can translate Mercury programs into bytecode for interpretation by a bytecode interpreter. The intent of this is to achieve faster turn-around time during development. However, the bytecode interpreter has not yet been written.

Smart recompilation

The Mercury compiler can record program dependency information to avoid unnecessary recompilations when an imported module's interface changes in a way which does not invalidate previously compiled code.

This functionality is implemented by the recompilation.m package.

Miscellaneous

The functionality of the above modules requires support from a significant number of utility predicates. Many of these are in modules that have been listed above, but some of them are not. Most of these are either in the backend_libs.m package (utility predicates used in more than one backend) or in the libs.m package (utility predicates used in many parts of the compiler, not just the various backends).

Currently undocumented