Mercury Analysis Framework

This directory formerly contained an implementation of the inter-module analysis framework described in:

Nicholas Nethercote. The Analysis Framework of HAL,
Chapter 7: Inter-module Analysis, Master's Thesis,
University of Melbourne, September 2001, revised April 2002.

The code has now been moved into the compiler directory, in the package `analysis'. The files that remain (Mmakefile, etc.) are unused.

This framework records call and answer patterns for arbitrary analyses, and performs dependency analysis to force recompilation where necessary when modules change.

Todo

Design

The analysis framework is a library which links into the client compiler, allowing the class methods to examine compiler data structures. The interface is as compiler-independent as possible, so that compilers which can interface with Mercury code via .NET could use it.

Clients of the library must define an instance of the typeclass `analysis.compiler', which describes the analyses the compiler wants to perform.

Each analysis is described by a call pattern type and an answer pattern type. A call pattern describes the information known about the argument variables before analysing a call (by executing it in the abstract domain used by the analysis). An answer pattern describes the information known after analysing the call. Call and answer patterns must form a partial order, and must be convertible to strings.

Analysis database

Before analysing a module, the client should call `analysis.lookup_results' and `analysis.lookup_requests' to find the call patterns for which the module should be analysed, in addition to the "default" call pattern (top). `lookup_results' will give the call patterns which we have encountered before whereas `lookup_requests' will give the call patterns which have been recently requested by other modules.

When analysing a module, at each call to an imported function the client should call `analysis.lookup_results' or `analysis.lookup_best_result' to find the results which match the call pattern.

If no results exist, the client should call `analysis.record_request', to ask that a specialized version be created on the next compilation of the client module. The client should also record the suboptimal result that was used in place of a more specialized result in the imported module's analysis registry.

There is currently no way to analyse higher-order or class method calls. It might be possible to analyse such calls where the set of possibly called predicates is known, but it is better to optimize away higher-order or class method calls where possible.

The client should call `analysis.record_dependency' for each external analysis result that is made use of by the module.

When compilation of a module is complete, the client should call `analysis.write_analysis_files' to write out all information collected during the compilation.

Called by analysis passes to record analysis requests and lookup answers for imported functions. The status of each answer recorded in the database is one of the following:

Analysis dependency checker (NYI)

Examines the dependencies between analysis results and the state of the compilation, then orders recompilations so that there are no `invalid' or `fixpoint_invalid' entries (with an option to eliminate `suboptimal' entries).

Each client compiler should have an option which invokes the analysis dependency checker rather than compiling code. This adjusts the status of entries in the database, then invokes the compiler's build tools (through a typeclass method) to recompile modules in the correct order.

If the implementation of a function changes, all of its answers are marked as invalid, and the results of the functions it directly uses in the SCC of the analysis dependency graph containing it are reset to `top' (marked `suboptimal') for greatest fixpoint analyses, or `bottom' (marked `fixpoint_invalid') for least fixpoint analyses. This ensures that the new result for the function is not computed using potentially invalid information.

After each compilation, the dependency checker examines the changes in the analysis results for each function.

For greatest fixpoint analyses, if the new answer is

For least fixpoint analyses, if the new answer is

The new answer itself will be marked as `optimal'. This isn't necessarily correct -- further recompilations may change its status to `fixpoint_invalid' or `suboptimal' (or `invalid' if there are source code changes).

Recompilation must proceed until there are no `invalid' or `fixpoint_invalid' entries. Optionally, optimization can proceed until there are no new requests or `suboptimal' answers.

It the responsibility of the analysis implementor to ensure termination of the analysis process by not generating an infinite number of requests.

Note: although `mmc' does not have an analysis dependency checker as described above, `mmc --make' now understands the intermodule analysis framework. Before the pass to compile modules to target code, there is an analysis pass. This is similar to the optimisation interface pass which generated `.opt' and `.trans_opt' files. `mmc --make' will repeat the analysis pass as long as there are invalid analysis registries, with the option to repeat the analysis pass as many times as desired in order to bring `suboptimal' modules to `optimal'.

Granularity of dependencies

The description in Nicholas Nethercote's thesis uses fine-grained dependency tracking, where for each exported answer only the imported analysis results used to compute that answer are recorded.

For simplicity, the initial Mercury implementation will only record dependencies of entire modules on particular analysis results (effectively the exported results depend on all imported analysis results used in that compilation). This is worthwhile because none of the analyses in the Mercury compiler currently record the information required for the more precise approach, and I would expect that other compilers not designed for inter-module analysis would also not record that information.

Differences of the implementation from Nick's thesis

We do not store "version calling patterns" in the .analysis files. However, we have .request files which contain calling patterns for which we had no answer, and the lookup_best_result()predicate automatically matches a calling pattern to the best possible result available.

We do not support "perfect calls" any differently from any other calls.

Our IMDGs only record entire modules as being dependent on some analysis result so if an analysis result changes, we can only mark/invalidate an entire module which makes use of the result. This means having an analysis status associated with each individual analysis result is somewhat pointless at the moment, as individual statuses will never be marked/invalidated, and we also can't make use of them to reanalyse only the procedures that actually need analysing.

Still to do

Page numbers are references to Nick's thesis.

We do not yet remove analysis results for procedures which no longer exist. (p. 107)

In the get_ext_answer procedure Nick uses a glbfunction in addition to a is_more_precise(Ans', Ans_glb) call. I don't know if this is necessary. (p. 108)

Garbage collection of calling patterns that are no longer used in the program. Our IMDGs do not record exactly which calling patterns are in use so I'm not sure how this would be done.

Removal of IMDG arcs for modules that used to import a module, but no longer do so. This would not be hard if we remember which modules were imported by Mpreviously in M.imdg. (p. 110)

There is no special handling yet of procedure level cycles. We cannot follow the approach of p. 116 as we have not enough information in our IMDGs.

Our treatment of library modules is fairly dumb. We just assume that library modules have read-only analysis files, and that we cannot make requests to or reanalysis library modules. (p. 117)

Modules are currently always analysed from the bottom of the module dependency graph upwards. When some analyses start making meaningful requests (unlike now where all call patterns are of the type `any_call'), the order that modules are analysed in should be changed to reduce unnecessary reanalyses.