Determinism (The Mercury Language Reference Manual)

7 Determinism

7.1 Determinism categories

For each mode of a predicate or function, we categorise that mode according to how many times it can succeed, and whether or not it can fail before producing its first solution.

If all possible calls to a particular mode of a predicate or function which return to the caller (calls which terminate, do not throw an exception and do not cause a fatal runtime error)

have exactly one solution, then that mode is deterministic (det);
either have no solutions or have one solution, then that mode is semideterministic (semidet);
have at least one solution but may have more, then that mode is multisolution (multi);
have zero or more solutions, then that mode is nondeterministic (nondet);
fail without producing a solution, then that mode has a determinism of failure.

If no possible calls to a particular mode of a predicate or function can return to the caller, then that mode has a determinism of erroneous.

The determinism annotation erroneous is used on the library predicates ‘require.error/1’ and ‘exception.throw/1’, but apart from that, determinism annotations erroneous and failure are generally not needed.

To summarize:

                Maximum number of solutions
Can fail?       0               1               > 1
no              erroneous       det             multi
yes             failure         semidet         nondet

(Note: the “Can fail?” column here indicates only whether the procedure can fail before producing at least one solution; attempts to find a second solution to a particular call, e.g. for a procedure with determinism multi, are always allowed to fail.)

The determinism of each mode of a predicate or function is indicated by an annotation on the mode declaration. For example:

:- pred append(list(T), list(T), list(T)).
:- mode append(in, in, out) is det.
:- mode append(out, out, in) is multi.
:- mode append(in, in, in) is semidet.

:- func length(list(T)) = int.
:- mode length(in) = out is det.
:- mode length(in(list_skel)) = out is det.
:- mode length(in) = in is semidet.

An annotation of det or multi is an assertion that for every value each of the inputs, there exists at least one value of the outputs for which the predicate is true, or (in the case of functions) for which the function term is equal to the result term. Conversely, an annotation of det or semidet is an assertion that for every value each of the inputs, there exists at most one value of the outputs for which the predicate is true, or (in the case of functions) for which the function term is equal to the result term. These assertions are called the mode-determinism assertions; aside from assisting in reasoning about the code, they may allow an implementation to perform optimizations that would not otherwise be allowed, such as optimizing away a goal with no outputs even though it might throw an exception (see Exception handling), contain a trace goal (see Trace goals), or infinitely loop. In some cases these optimizations may not be desirable; the strict sequential semantics guarantees that they will not be performed (see Formal semantics).

If the mode of the predicate is given in the :- pred declaration rather than in a separate :- mode declaration, then the determinism annotation goes on the :- pred declaration (and similarly for functions). In particular, this is necessary if a predicate does not have any argument variables. If the determinism declaration is given on a :- func declaration without the mode, the function is assumed to have the default mode (see Modes for more information on default modes of functions).

For example:

:- pred loop(int::in) is erroneous.
loop(X) :- loop(X).

:- pred p is det.
p.

:- pred q is failure.
q :- fail.

If there is no mode declaration for a function, then the default mode for that function is considered to have been declared as det. If you want to write a partial function, i.e. one whose determinism is semidet, then you must explicitly declare the mode and determinism.

In Mercury, a function is supposed to be a true mathematical function of its arguments; that is, the value of the function’s result should be determined only by the values of its arguments. Hence, for any mode of a function that specifies that all the arguments are fully input (i.e. for which the initial inst of all the arguments is a ground inst), the determinism of that mode can only be det, semidet, erroneous, or failure.

The determinism categories form this lattice:

             erroneous
              /     \
          failure   det
             \     /   \
             semidet  multi
                 \     /
                  nondet

The higher up this lattice a determinism category is, the more the compiler knows about the number of solutions of procedures of that determinism.

7.2 Determinism checking and inference

The determinism of goals is inferred from the determinism of their component parts, according to the rules below. The inferred determinism of a procedure is just the inferred determinism of the procedure’s body.

For procedures that are local to a module, the determinism annotations may be omitted; in that case, their determinism will be inferred. (To be precise, the determinism of procedures without a determinism annotation is defined as the least fixpoint of the transformation which, given an initial assignment of the determinism det to all such procedures, applies those rules to infer a new determinism assignment for those procedures.)

It is an error to omit the determinism annotation for procedures that are exported from their containing module.

If a determinism annotation is supplied for a procedure, the declared determinism is compared against the inferred determinism. If the declared determinism is greater than or not comparable to the inferred determinism (in the partial ordering above), it is an error. If the declared determinism is less than the inferred determinism, it is not an error, but the implementation may issue a warning.

The determinism category of each goal is inferred according to the following rules. These rules work with the two components of determinism categories: whether the goal can fail without producing a solution, and the maximum number of solutions of the goal (0, 1, or more). If the inference process below reports that a goal can succeed more than once, but the goal generates no outputs that are visible from outside the goal, and the goal is not impure (see Impurity), then the final determinism of the goal will be based on the goal succeeding at most once, since the compiler will implicitly prune away any duplicate solutions.

Calls

The determinism category of a call is the determinism declared or inferred for the called mode of the called procedure.

Unifications

The determinism of a unification is either det, semidet, or failure, depending on its mode.

A unification that assigns the value of one variable to another is deterministic. A unification that constructs a structure and assigns it to a variable is also deterministic. A unification that tests whether a variable has a given top function symbol is semideterministic, unless the compiler knows the top function symbol of that variable, in which case its determinism is either det or failure depending on whether the two function symbols are the same or not. A unification that tests two variables for equality is semideterministic, unless the compiler knows that the two variables are aliases for one another, in which case the unification is deterministic, or unless the compiler knows that the two variables have different function symbols in the same position, in which case the unification has a determinism of failure.

The compiler knows the top function symbol of a variable if the previous part of the procedure definition contains a unification of the variable with a function symbol, or if the variable’s type has only one function symbol.

Conjunctions

The determinism of the empty conjunction (the goal ‘true’) is det. The conjunction ‘(A, B)’ can fail if either A can fail, or if A can succeed at least once, and B can fail. The conjunction can succeed at most zero times if either A or B can succeed at most zero times. The conjunction can succeed more than once if either A or B can succeed more than once and both A and B can succeed at least once. (If e.g. A can succeed at most zero times, then even if B can succeed many times the maximum number of solutions of the conjunction is still zero.) Otherwise, i.e. if both A and B succeed at most once, the conjunction can succeed at most once.

Switches

A disjunction is a switch if each disjunct has near its start a unification that tests the same bound variable against a different function symbol. For example, consider the common pattern

(
    L = [], empty(Out)
;
    L = [H|T], nonempty(H, T, Out)
)

If L is input to the disjunction, then the disjunction is a switch on L.

If two variables are unified with each other, then whatever function symbol one variable is unified with, the other variable is considered to be unified with the same function symbol. In the following example, since K is unified with L, the second disjunct unifies L as well as K with cons, and thus the disjunction is recognized as a switch.

(
    L = [], empty(Out)
;
    K = L, K = [H|T], nonempty(H, T, Out)
)

A switch can fail if the various arms of the switch do not cover all the function symbols in the type of the switched-on variable, or if the code in some arms of the switch can fail, bearing in mind that in each arm of the switch, the unification that tests the switched-on variable against the function symbol of that arm is considered to be deterministic. A switch can succeed several times if some arms of the switch can succeed several times, possibly because there are multiple disjuncts that test the switched-on variable against the same function symbol. A switch can succeed at most zero times only if all the reachable arms of the switch can succeed at most zero times. (A switch arm is not reachable if it unifies the switched-on variable with a function symbol that is ruled out by that variable’s initial instantiation state.)

Only unifications may occur before the test of the switched-on variable in each disjunct. Tests of the switched-on variable may occur within existential quantification goals.

The following example is a switch.

(
    Out = 1, L = []
;
    some [H, T] (
        L = [H|T],
        nonempty(H, T, Out)
    )
)

The following example is not a switch because the call in the first disjunct occurs before the test of the switched-on variable.

(
    empty(Out), L = []
;
    L = [H|T], nonempty(H, T, Out)
)

The unification of the switched-on variable with a function symbol may occur inside a nested disjunction in a given disjunct, provided that unification is preceded only by other unifications, both inside the nested disjunction and before the nested disjunction. The following example is a switch on X, provided X is bound beforehand.

(
    X = f
    p(Out)
;
    Y = X,
    (
        Y = g,
        Intermediate = 42
    ;
        Z = Y,
        Z = h(Arg),
        q(Arg, Intermediate)
    ),
    r(Intermediate, Out)
)

Disjunctions

The determinism of the empty disjunction (the goal ‘fail’) is failure. A disjunction ‘(A ; B)’ that is not a switch can fail if both A and B can fail. It can succeed at most zero times if both A and B can succeed at most zero times. It can succeed at most once if one of A and B can succeed at most once and the other can succeed at most zero times. Otherwise, i.e. if either A or B can succeed more than once, or if both A and B can succeed at least once, it can succeed more than once.

If-then-else

If the condition of an if-then-else cannot fail, the if-then-else is equivalent to the conjunction of the condition and the “then” part, and its determinism is computed accordingly. Otherwise, an if-then-else can fail if either the “then” part or the “else” part can fail. It can succeed at most zero times if the “else” part can succeed at most zero times and if at least one of the condition and the “then” part can succeed at most zero times. It can succeed more than once if any one of the condition, the “then” part and the “else” part can succeed more than once.

Negations

If the determinism of the negated goal is erroneous, then the determinism of the negation is erroneous. If the determinism of the negated goal is failure, the determinism of the negation is det. If the determinism of the negated goal is det or multi, the determinism of the negation is failure. Otherwise, the determinism of the negation is semidet.

7.3 Replacing compile-time checking with run-time checking

Note that “perfect” determinism inference is an undecidable problem, because it requires solving the halting problem. (For instance, in the following example

:- pred p(T, T).
:- mode p(in, out) is det.

p(A, B) :-
    (
        something_complicated(A, B)
    ;
        B = A
    ).

‘p/2’ can have more than one solution only if ‘something_complicated/2’ can succeed.) Sometimes, the rules specified by the Mercury language for determinism inference will infer a determinism that is not as precise as you would like. However, it is generally easy to overcome such problems. The way to do this is to replace the compiler’s static checking with some manual run-time checking. For example, if you know that a particular goal should never fail, but the compiler infers that goal to be semidet, you can check at runtime that the goal does succeed, and if it fails, call the library predicate ‘error/1’.

:- pred q(T, T).
:- mode q(in, out) is det.

q(A, B) :-
    ( if goal_that_should_never_fail(A, B0) then
        B = B0
    else
        error("goal_that_should_never_fail failed!")
    ).

The predicate error/1 has determinism erroneous, which means the compiler knows that it will never succeed or fail, so the inferred determinism for the body of q/2 is det. (Checking assumptions like this is good coding style anyway. The small amount of up-front work that Mercury requires is paid back in reduced debugging time.) Mercury’s mode analysis knows that computations with determinism erroneous can never succeed, which is why it does not require the “else” part to generate a value for B. The introduction of the new variable B0 is necessary because the condition of an if-then-else is a negated context, and can export the values it generates only to the “then” part of the if-then-else, not directly to the surrounding computation. (If the surrounding computations had direct access to values generated in conditions, they might access them even if the condition failed.)

7.4 Interfacing nondeterministic code with the real world

Normally, attempting to call a nondet or multi mode of a predicate from a predicate declared as semidet or det will cause a determinism error. So how can we call nondeterministic code from deterministic code? There are several alternative possibilities.

If you just want to see if a nondeterministic goal is satisfiable or not, without needing to know what variable bindings it produces, then there is no problem - determinism analysis considers nondet and multi goals with no non-local output variables to be semidet and det respectively.

If you want to use the values of output variables, then you need to ask yourself which one of possibly many solutions to a goal do you want? If you want all of them, you need to one of the predicates in the standard library module ‘solutions’, such as ‘solutions/2’ itself, which collects all of the solutions to a goal into a list — see Higher-order.

If you just want one solution from a predicate and don’t care which, you should declare the relevant mode of the predicate to have determinism cc_nondet or cc_multi (depending on whether you are guaranteed at least one solution or not). This tells the compiler that this mode of this predicate may have more than one solution when viewed as a statement in logic, but the implementation should stop after generating the first solution. In other words, the implementation should commit to the first solution.

The commit to the first solution means that a piece of cc_nondet or cc_multi code can never be asked to generate a second solution. If e.g. a cc_nondet call is in a conjunction, then no later goal in that conjunction (after mode reordering) may fail, because that would ask the committed choice goal for a second solution. The compiler enforces this rule.

In the declarative semantics, which solution will be the first is unpredictable, but in the operational semantics, you can predict which solution will be the first, since Mercury does depth-first search with left-to-right execution of clause bodies, though that is not on the source code form of each clause body, but on its form after mode analysis reordering to put the producer of each variable before all its consumers.

The ‘committed choice nondeterminism’ of a predicate has to be propagated up the call tree, making its callers, its callers’ callers, and so on, also cc_nondet or cc_multi, until either you get to main/2 at the top of the call tree, or you get to a location where you don’t have to propagate the committed choice context upward anymore.

While main/2 is usually declared to have determinism det, it may also be declared cc_multi. In the latter case, while the program’s declarative semantics may admit more than one solution, the implementation will stop after the first, so alternative solutions to main/2 (and hence also to cc_nondet or cc_multi predicates called directly or indirectly from ‘main/2’) are implicitly pruned away. This is similar to the “don’t care” nondeterminism of committed choice logic programming languages such as GHC.

One way to stop propagating committed choice nondeterminism is the one mentioned above: if a goal has no non-local output variables (i.e. none of the goal’s outputs are visible from outside the goal), then the goal’s solutions are indistinguishable from the outside, and the implementation will only attempt to satisfy the goal once, whether or not the goal is committed choice. Therefore if a cc_multi goal has all its outputs ignored, then the compiler considers it to be a det goal, while if a cc_nondet goal has all its outputs ignored, then the compiler considers it to be a semidet goal.

The other way to stop propagating committed choice nondeterminism is applicable when you know that all the solutions returned will be equivalent in all the ways that your program cares about. For example, you might want to find the maximum element in a set by iterating over the elements in the set. Iterating over the elements in a set in an unspecified order is a nondeterministic operation, but no matter which order you iterate over them, the maximum value in the set should be the same.

If this condition is satisfied, i.e. if you know that there will only ever be at most one distinct solution under your equality theory of the output variables, then you can use a ‘promise_equivalent_solutions’ determinism cast. If goal ‘G’ is a cc_multi goal whose outputs are X and Y, then promise_equivalent_solutions [X, Y] ( G ) promises the compiler that all solutions of G are equivalent, so that regardless of which solution of G the implementation happens to commit to, the rest of the program will compute either identical or (similarly) equivalent results. This allows the compiler to consider promise_equivalent_solutions [X, Y] ( G ) to have determinism det. Likewise, the compiler will consider promise_equivalent_solutions [X, Y] ( G ) where G is cc_nondet to have determinism semidet.

Note that specifying a user-defined equivalence relation as the equality predicate for user-defined types (see User-defined equality and comparison) means that ‘promise_equivalent_solutions’ can be used to express more general forms of equivalence. For example, if you define a set type which represents sets as unsorted lists, you would want to define a user-defined equivalence relation for that type, which could sort the lists before comparing them. The ‘promise_equivalent_solutions’ determinism cast could then be used for sets even though the lists used to represent the sets might not be in the same order in every solution.

7.5 Committed choice nondeterminism

In addition to the determinism annotations described earlier, there are “committed choice” versions of multi and nondet, called cc_multi and cc_nondet. These can be used instead of multi or nondet if all calls to that mode of the predicate (or function) occur in a context in which only one solution is needed.

Such single-solution contexts are determined as follows.

The body of any procedure declared cc_multi or cc_nondet is in a single-solution context. For example, the program entry point ‘main/2’ may be declared cc_multi, and in that case the clauses for main are in a single-solution context.
Any goal with no output variables is in a single-solution context.
If a conjunction is in a single-solution context, then the right-most conjunct is in a single-solution context, and if the right-most conjunct cannot fail, then the rest of the conjunction is also in a single-solution context. (“Right-most” here refers to the order after mode reordering.)
If an if-then-else is in a single-solution context, then the “then” part and the “else” part are in single-solution contexts, and if the “then” part cannot fail, then the condition of the if-then-else is also in a single-solution context.
For other compound goals, i.e. disjunctions, negations, and (explicitly) existentially quantified goals, if the compound goal is in a single-solution context, then the immediate sub-goals of that compound goal are also in single-solution contexts.

The compiler will check that all calls to a committed-choice mode of a predicate (or function) do indeed occur in a single-solution context.

You can declare two different modes of a predicate (or function) which differ only in “cc-ness” (i.e. one being multi and the other cc_multi, or one being nondet and the other cc_nondet). In that case, the compiler will select the appropriate one for each call depending on whether the call comes from a single-solution context or not. Calls from single-solution contexts will call the committed choice version, while calls which are not from single-solution contexts will call the backtracking version.

There are several reasons to use committed choice determinism annotations. One reason is for efficiency: committed choice annotations allow the compiler to generate much more efficient code. Another reason is for doing I/O, which is allowed only in det or cc_multi predicates, not in multi predicates. Another is for dealing with types that use non-canonical representations (see User-defined equality and comparison). And there are a variety of other applications.