C Coding Standard for the Mercury Project

These coding guidelines are presented in the briefest manner possible and therefore do not include rationales.

Because the coding standard has been kept deliberately brief, there are some items missing that would be included in a more comprehensive standard. For more on commonsense C programming, consult the Indian Hill C coding standard or the comp.lang.c FAQ.

1. File organization

1.1. Modules and interfaces

We impose a discipline on C to allow us to emulate (poorly) the modules of languages such as Ada and Modula-3.

1.2. Organization within a file

1.2.1. Source files

Items in source files should in general be in this order: Within each section, items should generally be listed in top-down order, not bottom-up. That is, if foo() calls bar(), then the definition of foo() should precede the definition of bar(). (An exception to this rule is functions that are explicitly declared inline; in that case, the definition should precede the call, to make it easier for the C compiler to perform the desired inlining.)

1.2.2. Header files

Items in headers should in general be in this order: However, it is probably more important to group items which are conceptually related than to strictly follow this order.

Also note that #defines which either define configuration macros used for conditional compilation, or define constants that are used for array sizes, will need to come before the code that uses them. But in general, configuration macros should be isolated in separate files (e.g. runtime/mercury_conf.h.in and runtime/mercury_conf_param.h), and fixed-length limits should be avoided, so those cases should not arise often.

Every header should be protected against multiple inclusion using the following idiom:

#ifndef MODULE_H
#define MODULE_H

... body of module.h ...

#endif // not MODULE_H

2. Comments

2.1. What should be commented

2.1.1. Functions

Each function should have a one-sentence description of what it does.

Note: memory allocation for C code that must interface with Mercury code or the Mercury runtime should be done using the routines defined and documented in mercury/runtime/mercury_memory.h and/or mercury/runtime/mercury_heap.h, according to the documentation in those files, in mercury/trace/README, and in the Mercury Language Reference Manual. Such function comments should be present in header files for each function exported from a source file. Ideally, a client of the module should not have to look at the implementation, only the interface. In C terminology, the header should suffice for working out how an exported function works.

2.1.2. Macros

Each non-trivial macro should be documented just as for functions (see above). It is also a good idea to document the types of macro arguments and return values, e.g. by including a function declaration in a comment.

2.1.3. Global variables

Any global variable should be excruciatingly documented; it should be crystal clear what every one of its possible values would mean. This is especially true when globals are exported from a module. In general, there are very few circumstances that justify use of a global.

2.2. Comment style

New comments should use this form:
    // Here is a comment.
    // And here is some more comment.
Older comments had this form:
    /*
    ** Here is a comment.
    ** And here's some more comment.
    */
New annotations to a single line of code should use this form:
    i += 3; // Here is a comment about this line of code.
Older annotations had this form:
    i += 3; /* Here's a comment about this line of code. */

2.3. Guidelines for comments

2.3.1. Revisits

Any code that needs to be revisited because it is a temporary hack (or some other expediency) must have a comment of the form:
    // XXX: <reason for revisit>
The <reason for revisit> should explain the problem in a way that can be understood by developers other than the author of the comment.

2.3.2. Comments on preprocessor statements

The #ifdef constructs should be commented like so if they extend for more than a few lines of code:
#ifdef SOME_VAR
    ...
#else   // not SOME_VAR
    ...
#endif  // not SOME_VAR
Similarly for #ifndef.

Use the GNU convention of comments that indicate whether the variable is true in the #if and #else parts of an #ifdef or #ifndef. For instance:

#ifdef SOME_VAR
#endif // SOME_VAR

#ifdef SOME_VAR
    ...
#else  // not SOME_VAR
    ...
#endif // not SOME_VAR

#ifndef SOME_VAR
    ...
#else  // SOME_VAR
    ...
#endif // SOME_VAR

3. Declarations

3.1. Pointer declarations

Attach the pointer qualifier to the variable name.
    char    *str1, *str2;

3.2. Static and extern declarations

Limit module exports to the absolute essentials. Make as much static (that is, local) as possible, since this keeps interfaces to modules simpler.

3.3. Typedefs

Use typedefs to make code self-documenting. They are especially useful on structs, unions, and enums.

4. Naming conventions

4.1. Functions, function-like macros, and variables

Use all lowercase with underscores to separate words. For instance, MR_soul_machine.

4.2. Enumeration constants, #define constants, and non-function-like macros

Use all uppercase with underscores to separate words. For instance, ML_MAX_HEADROOM.

4.3. Typedefs

Other than the MR_ prefix, we use CamelCase for type names composed of more than one word. This means that each word has uppercase only for its first letter, and that the boundaries between successive words are indicated only by the change in capitalization. For instance, MR_DirectoryEntry.

4.4. Structs and unions

If something is both a struct and a typedef, the name for the struct should be formed by appending `_Struct' to the typedef name:
    typedef struct MR_DirectoryEntry_Struct {
        ...
    } MR_DirectoryEntry;
For unions, append `_Union' to the typedef name.

4.5. Mercury specifics

Every symbol that is externally visible (i.e. declared in a header file) should be prefixed with a prefix that is specific to the package that it comes from. For anything exported from mercury/runtime, prefix it with MR_. For anything exported from mercury/library, prefix it with ML_.

5. Syntax and layout

5.1. Minutiae

5.2. Statements

Use one statement per line. What follows are example layout styles for the various syntactic constructs.

5.2.1. If statement

Always put braces around the then-part, and the else-part if it exists, even if they contain only a single statement. If an if-then-else statement is longer than a page, consider whether it can be shortened by moving some of its code to named functions. If you decide against this, then add an "// end if" comment at the end.
// Curlies are placed according to the K&R one true brace style.
// And comments look like this.
if (blah) {
    // Always use curlies, even when there is only one statement
    // in the block.
} else {
    ...
} // end if

// If the condition is so long that the open curly doesn't fit
// on the same line as the `if', put it on a line of its own.
if (a_very_long_condition() &&
    another_long_condition_that_forces_a_line_wrap())
{
    ...
}

5.2.2. Functions

Function names are flush against the left margin. This makes it easier to grep for function definitions (as opposed to their invocations). In argument lists, put space after commas. And if the function is longer than a page, add a // func comment after the curly that ends its definition.
int
rhododendron(int a, float b, double c) {
    ...
} // end rhododendron()

5.2.3. Variables

Variable names in variable declarations and definitions shouldn't be flush left, however; they should be preceded by the type.
int x = 0, y = 3, z;

int a[] = {
    1,2,3,4,5
};

5.2.4. Switches

switch (blah) {
    case BLAH1:
        ...
        break;
    case BLAH2: {
        int i;

        ...
        break;
    }
    default:
        ...
        break;
} // switch

5.2.5. Structs, unions, and enums

struct Point {
    int     tag;
    union   cool {
        int     ival;
        double  dval;
    } cool;
};
enum Stuff {
    STUFF_A, STUFF_B ...
};

5.2.6. Loops

while (stuff) {
    ...
}

do {
    ...
} while (stuff);

for (this; that; those) {
    // Always use curlies, even if there is no body.
}

// If no body, do this...
while (stuff) {
    // Do nothing.
}
for (this; that; those) {
    // Do nothing.
}

5.3. Preprocessing

5.3.1. Nesting

Nested #ifdefs, #ifndefs and #ifs should be indented by two spaces for each level of nesting. For example:
#ifdef GUAVA
  #ifndef PAPAYA
  #else  // PAPAYA
  #endif // PAPAYA
#else  // not GUAVA
#endif // not GUAVA

6. Portability

6.1. Architecture specifics

Avoid relying on properties of a specific machine architecture unless necessary, and if necessary localise such dependencies. One solution is to have architecture-specific macros to hide access to machine-dependent code. Some machine-specific properties are:

6.2. Operating system specifics

Operating system APIs differ from platform to platform. Although most support standard POSIX calls such as `read', `write' and `unlink', you cannot rely on the presence of, for instance, System V shared memory, or BSD sockets.

Adhere to POSIX-supported operating system calls whenever possible since they are widely supported, even by Windows and VMS.

When POSIX doesn't provide the required functionality, ensure that the operating system specific calls are localised.

6.3. Compiler and C library specifics

ANSI C compilers are now widespread and hence we needn't pander to old K&R compilers. However compilers (in particular the GNU C compiler) often provide non-ANSI extensions. Ensure that any use of compiler extensions is localised and protected by #ifdefs.

Don't rely on features whose behaviour is undefined according to the ANSI C standard. For that matter, don't rely on C arcana even if they are defined. For instance, setjmp/longjmp and ANSI signals often have subtle differences in behaviour between platforms.

If you write threaded code, make sure any non-reentrant code is appropriately protected via mutual exclusion. The biggest cause of non-reentrant (non-threadsafe) code is function-static data. Note that some C library functions may be non-reentrant. This may or may not be documented in their man pages.

6.4. Environment specifics

This is one of the most important sections in the coding standard. Here we mention what other tools Mercury depends on. Mercury must depend on some tools, however every tool that is needed to use Mercury reduces the potential user base.

Bear this in mind when tempted to add YetAnotherToolTM.

6.4.1. Tools required for Mercury

In order to run Mercury (given that you have the binary installation), you need: In order to build the Mercury compiler, you need the above and also:

In order to modify and maintain the source code of the Mercury compiler, you need the above and also:

6.4.2. Documenting the tools

If further tools are required, you should add them to the above list. And similarly, if you eliminate dependence on a tool, remove it from the above list.

7. Coding specifics


Note: This coding standard is an amalgam of suggestions from the entire Mercury team, not necessarily the opinion of any single author.

Comments? See our contact page.