The Mercury Language Reference Manual: Tokens

2.4 Tokens

The different tokens in Mercury are as follows. Tokens may be separated by whitespace.

line number directive

A line number directive consists of the character ‘#’, a positive integer specifying the line number, and then a newline. A ‘#line’ directive’s only role is to specifying the line number; it is otherwise ignored by the syntax. Line number directives may occur anywhere a token may occur. They are used in conjunction with the ‘pragma source_file’ declaration to indicate that the Mercury code following was generated by another tool; they serve to associate each line in the Mercury code with the source file name and line number of the original source from which the Mercury code was derived, so that the Mercury compiler can issue more informative error messages using the original source code locations. A ‘#line’ directive specifies the line number for the immediately following line. Line numbers for lines after that are incremented as usual, so the second line after a ‘#100’ directive would be considered to be line number 101.

string

A string is a sequence of characters enclosed in double quotes (").

Within a string, two adjacent double quotes stand for a single double quote. For example, the string ‘ """" ’ is a string of length one, containing a single double quote: the outermost pair of double quotes encloses the string, and the innermost pair stand for a single double quote.

Strings may also contain backslash escapes. ‘\a’ stands for “alert” (a beep character), ‘\b’ for backspace, ‘\r’ for carriage-return, ‘\f’ for form-feed, ‘\t’ for tab, ‘\n’ for newline, ‘\v’ for vertical-tab. An escaped backslash, single-quote, or double-quote stands for itself.

The sequence ‘\x’ introduces a hexadecimal escape; it must be followed by a sequence of hexadecimal digits and then a closing backslash. It is replaced with the character whose character code is identified by the hexadecimal number. Similarly, a backslash followed by an octal digit is the beginning of an octal escape; as with hexadecimal escapes, the sequence of octal digits must be terminated with a closing backslash.

The sequence ‘\u’ or ‘\U’ can be used to escape Unicode characters. ‘\u’ must be followed by the Unicode character code expressed as four hexadecimal digits. ‘\U’ must be followed by the Unicode character code expressed as eight hexadecimal digits. The highest allowed value is ‘\U0010FFFF’.

A backslash followed immediately by a newline is deleted; thus an escaped newline can be used to continue a string over more than one source line. (String literals may also contain embedded newlines.)

name

A name is either an unquoted name or a quoted name. An unquoted name is a lowercase letter followed by zero or more letters, underscores, and digits. A quoted name is any sequence of zero or more characters enclosed in single quotes ('). Within a quoted name, two adjacent single quotes stand for a single single quote. Quoted names can also contain backslash escapes of the same form as for strings.

Note that if a character is an operator, then the result of enclosing that character in single quotes is also an operator. Since e.g. : is an operator, ':' is an operator as well, which means that e.g. Char = ':' is not valid code. To make it valid code, you need to put parentheses around the operator, to prevent the scope of the operator from extending to the surrounding code. This means that code such as Char = (':') is valid.

variable

A variable is an uppercase letter or underscore followed by zero or more letters, underscores, and digits. A variable token consisting of single underscore is treated specially: each instance of ‘_’ denotes a distinct variable. (In addition, variables starting with an underscore are presumed to be “don’t-care” variables; the compiler will issue a warning if a variable that does not start with an underscore occurs only once, or if a variable starting with an underscore occurs more than once in the same scope.)

integer

An integer is either a decimal, binary, octal, hexadecimal, or character-code literal. A decimal literal is any sequence of decimal digits. A binary literal is ‘0b’ followed by any sequence of binary digits. An octal literal is ‘0o’ followed by any sequence of octal digits. A hexadecimal literal is ‘0x’ followed by any sequence of hexadecimal digits. A character-code literal is ‘0'’ followed by any single character.

Decimal, binary, octal and hexadecimal literals may be optionally terminated by a suffix that indicates whether the literal represents a signed or unsigned integer and what the size of that integer is. These suffixes are:

Suffix	Signedness	Size
`i` or no suffix	Signed	Implementation-defined
`i8`	Signed	8-bit
`i16`	Signed	16-bit
`i32`	Signed	32-bit
`i64`	Signed	64-bit
`u`	Unsigned	Implementation-defined
`u8`	Unsigned	8-bit
`u16`	Unsigned	16-bit
`u32`	Unsigned	32-bit
`u64`	Unsigned	64-bit

For decimal, binary, octal and hexadecimal literals, an arbitrary number of underscores (‘_’) may be inserted between the digits. An arbitrary number of underscores may also be inserted between the radix prefix (i.e. ‘0b’, ‘0o’ and ‘0x’) and the initial digit. Similarly, an arbitrary number of underscores may be inserted between the final digit and the signedness suffix. The purpose of the underscores is to improve readability, and they do not affect the numeric value of the literal.

float

A floating point literal consists of a sequence of decimal digits, a decimal point (‘.’) and a sequence of digits (the fraction part), and the letter ‘E’ (or ‘e’), an optional sign (‘+’ or ‘-’), and then another sequence of decimal digits (the exponent). The fraction part or the exponent (but not both) may be omitted.

An arbitrary number of underscores (‘_’) may be inserted between the digits in a floating point literal. Underscores may not occur adjacent to any non-digit characters (i.e. ‘.’, ‘e’, ‘E’, ‘+’ or ‘-’) in a floating point literal. The purpose of the underscores is to improve readability, and they do not affect the numeric value of the literal.

implementation_defined_literal

An implementation-defined literal consists of a dollar sign (‘$’) followed by an unquoted name.

open_ct

A left parenthesis, ‘(’, that is not preceded by whitespace.

open

A left parenthesis, ‘(’, that is preceded by whitespace.

close

A right parenthesis, ‘)’.

open_list

A left square bracket, ‘[’.

close_list

A right square bracket, ‘]’.

open_curly

A left curly bracket, ‘{’.

close_curly

A right curly bracket, ‘}’.

ht_sep

A “head-tail separator”, i.e. a vertical bar, ‘|’.

comma

A comma, ‘,’.

end

A full stop (period), ‘.’.

eof

The end of file.