Next: , Previous: , Up: Syntax   [Contents]

2.2 Tokens

Tokens in Mercury are the same as in ISO Prolog. The only differences are the ‘#line’ token, which is used as a line number directive (see below) and the backquote (‘`’) token.

The different tokens are as follows. Tokens may be separated by whitespace or line number directives.

line number directive

A line number directive consists of the character ‘#’, a positive integer specifying the line number, and then a newline. A ‘#line’ directive’s only role is to specifying the line number; it is otherwise ignored by the syntax. Line number directives may occur anywhere a token may occur. They are used in conjunction with the ‘pragma source_file’ declaration to indicate that the Mercury code following was generated by another tool; they serve to associate each line in the Mercury code with the source file name and line number of the original source from which the Mercury code was derived, so that the Mercury compiler can issue more informative error messages using the original source code locations. A ‘#line’ directive specifies the line number for the immediately following line. Line numbers for lines after that are incremented as usual, so the second line after a ‘#100’ directive would be considered to be line number 101.


A string is a sequence of characters enclosed in double quotes (").

Within a string, two adjacent double quotes stand for a single double quote. For example, the string ‘ """" ’ is a string of length one, containing a single double quote: the outermost pair of double quotes encloses the string, and the innermost pair stand for a single double quote.

Strings may also contain backslash escapes. ‘\a’ stands for “alert” (a beep character), ‘\b’ for backspace, ‘\r’ for carriage-return, ‘\f’ for form-feed, ‘\t’ for tab, ‘\n’ for newline, ‘\v’ for vertical-tab. An escaped backslash, single-quote, or double-quote stands for itself.

The sequence ‘\x’ introduces a hexadecimal escape; it must be followed by a sequence of hexadecimal digits and then a closing backslash. It is replaced with the character whose character code is identified by the hexadecimal number. Similarly, a backslash followed by an octal digit is the beginning of an octal escape; as with hexadecimal escapes, the sequence of octal digits must be terminated with a closing backslash.

The sequence ‘\u’ or ‘\U’ can be used to escape Unicode characters. ‘\u’ must be followed by the Unicode character code expressed as four hexadecimal digits. ‘\U’ must be followed by the Unicode character code expressed as eight hexadecimal digits. The highest allowed value is ‘\U0010FFFF’.

A backslash followed immediately by a newline is deleted; thus an escaped newline can be used to continue a string over more than one source line. (String literals may also contain embedded newlines.)


A name is either an unquoted name or a quoted name. An unquoted name is a lowercase letter followed by zero or more letters, underscores, and digits. A quoted name is any sequence of zero or more characters enclosed in single quotes ('). Within a quoted name, two adjacent single quotes stand for a single single quote. Quoted names can also contain backslash escapes of the same form as for strings.


A variable is an uppercase letter or underscore followed by zero or more letters, underscores, and digits. A variable token consisting of single underscore is treated specially: each instance of ‘_’ denotes a distinct variable. (In addition, variables starting with an underscore are presumed to be “don’t-care” variables; the compiler will issue a warning if a variable that does not start with an underscore occurs only once, or if a variable starting with an underscore occurs more than once in the same scope.)


An integer is either a decimal, binary, octal, hexadecimal, or character-code literal. A decimal literal is any sequence of decimal digits. A binary literal is ‘0b’ followed by any sequence of binary digits. An octal literal is ‘0o’ followed by any sequence of octal digits. A hexadecimal literal is ‘0x’ followed by any sequence of hexadecimal digits. A character-code literal is ‘0'’ followed by any single character.


A floating point literal consists of a sequence of decimal digits, a decimal point and a sequence of digits (the fraction part), and the letter ‘E’ (or ‘e’), an optional sign (‘+’ or ‘-’), and then another sequence of decimal digits (the exponent). The fraction part or the exponent (but not both) may be omitted.


An implementation-defined literal consists of a dollar sign ($) followed by an unquoted name.


A left parenthesis, ‘(’, that is not preceded by whitespace.


A left parenthesis, ‘(’, that is preceded by whitespace.


A right parenthesis, ‘)’.


A left square bracket, ‘[’.


A right square bracket, ‘]’.


A left curly bracket, ‘{’.


A right curly bracket, ‘}’.


A “head-tail separator”, i.e. a vertical bar, ‘|’.


A comma, ‘,’.


A full stop (period), ‘.’.


The end of file.

Next: , Previous: , Up: Syntax   [Contents]