Optimizations specific to low level code (The Mercury User’s Guide)

11.16.5 Optimizations specific to low level code

--try-switch-size N

The number of alternatives in a try/retry chain switch must be at least this number (default: 3).

--binary-switch-size N

The number of alternatives in a binary search switch must be at least this number (default: 4).

--middle-rec

Enable the middle recursion optimization.

Optimization levels 1 to 6 automatically set –middle-rec.

--simple-neg

Generate simplified code for simple negations.

Optimization levels 2 to 6 automatically set –simple-neg.

--llds-optimize

--llds-optimise

Enable the LLDs->LLDS optimization passes.

Optimization levels 0 to 6 automatically set –llds-optimize.

--optimize-repeat N

--optimise-repeat N

Iterate most LLDS->LLDS optimizations at most N times (default: 3).

Optimization levels 0 to 1 automatically set –optimize-repeat=1.

Optimization level 2 automatically sets –optimize-repeat=3.

Optimization levels 3 to 4 automatically set –optimize-repeat=4.

Optimization levels 5 to 6 automatically set –optimize-repeat=5.

--optimize-peep

--optimise-peep

Enable local peephole optimizations.

Optimization levels 0 to 6 automatically set –optimize-peep.

--optimize-labels

--optimise-labels

Delete dead labels, and the unreachable code following them.

Optimization levels 0 to 6 automatically set –optimize-labels.

--optimize-jumps

--optimise-jumps

Enable the short-circuiting of jumps to jumps.

Optimization levels 0 to 6 automatically set –optimize-jumps.

--optimize-fulljumps

--optimise-fulljumps

Enable the elimination of jumps to ordinary code.

Optimization levels 2 to 6 automatically set –optimize-fulljumps.

--checked-nondet-tailcalls

Convert nondet calls into tail calls whenever possible, even when this requires a runtime check. This option tries to minimize stack consumption, possibly at the expense of speed.

--pessimize-tailcalls

Disable the optimization of tailcalls. This option tries to minimize code size at the expense of speed.

--optimize-delay-slot

--optimise-delay-slot

Disable branch delay slot optimizations, This option is meaningful only if the target architecture has delay slots.

Optimization levels 1 to 6 automatically set –optimize-delay-slot.

--optimize-frames

--optimise-frames

Optimize the operations that maintain stack frames.

Optimization levels 1 to 6 automatically set –optimize-frames.

--optimize-reassign

--optimise-reassign

Optimize away assignments to memory locations that already hold the to-be-assigned value.

Optimization levels 3 to 6 automatically set –optimize-reassign.

--use-local-vars

Use local variables in C code blocks wherever possible.

Optimization levels 1 to 6 automatically set –use-local-vars.

--optimize-dups

--optimise-dups

Enable elimination of duplicate code within procedures.

Optimization levels 2 to 6 automatically set –optimize-dups.

--optimize-proc-dups

--optimise-proc-dups

Enable elimination of duplicate procedures.

--common-data

Enable optimization of common data structures.

Optimization levels 0 to 6 automatically set –common-data.

--no-common-layout-data

Disable optimization of common subsequences in layout structures.

--layout-compression-limit N

Attempt to compress the layout structures used by the debugger only as long as the arrays involved have at most N elements (default: 4000).

--emit-c-loops

Use C loop contstructs to implement loops. With ‘--no-emit-c-loops’, use only gotos.

Optimization levels 1 to 6 automatically set –emit-c-loops.

--procs-per-c-function N

--procs-per-C-function N

Put the code for up to N Mercury procedures in a single C function. The default value of N is one. Increasing N can produce slightly more efficient code, but makes compilation slower.

--no-local-thread-engine-base

Do not copy the thread-local Mercury engine base address into a local variable, even when this would be appropriate. This option is effective only in low-level parallel grades that do not use the GNU C global register variables extension.

--inline-alloc

Inline calls to GC_malloc(). This can improve performance a fair bit, but may significantly increase code size. This option is meaningful only if the selected garbage collector is boehm, and if the C compiler is gcc.

Optimization level 6 automatically sets –inline-alloc.

--use-macro-for-redo-fail

Emit the fail or redo macro instead of a branch to the fail or redo code in the runtime system. This produces slightly bigger but slightly faster code.

Optimization level 6 automatically sets –use-macro-for-redo-fail.