Next: , Previous: string.builder, Up: Top   [Contents]


86 string

%--------------------------------------------------%
% vim: ts=4 sw=4 et ft=mercury
%--------------------------------------------------%
% Copyright (C) 1993-2012 The University of Melbourne.
% Copyright (C) 2013-2020 The Mercury team.
% This file is distributed under the terms specified in COPYING.LIB.
%--------------------------------------------------%
%
% File: string.m.
% Main authors: fjh, petdr, wangp.
% Stability: medium to high.
%
% This module provides basic string handling facilities.
%
% Mercury strings are Unicode strings. They use either the UTF-8 or UTF-16
% encoding, depending on the target language.
%
% When Mercury is compiled to C, strings are UTF-8 encoded, with a null
% character as the string terminator. A single code point requires one to four
% bytes (code units) to encode.
%
% When Mercury is compiled to Java, strings are represented using Java's
% String type. When Mercury is compiled to C#, strings are represented using
% C#'s `System.String' type. Both of these types use the UTF-16 encoding.
% A single code point requires one or two 16-bit integers (code units)
% to encode.
%
% The Mercury compiler will only allow well-formed UTF-8 or UTF-16 string
% constants. However, it is possible to produce strings containing invalid
% UTF-8 or UTF-16 via I/O, foreign code, and substring operations.
% Predicates or functions that inspect strings may fail, throw an exception,
% or else behave in some special way when they encounter an ill-formed
% code unit sequence.
%
% Unexpected null characters embedded in the middle of strings can be a source
% of security vulnerabilities, so the Mercury library predicates and functions
% which create strings from (lists of) characters throw an exception if they
% detect such a null character. Programmers must not create strings that might
% contain null characters using the foreign language interface.
%
% The builtin comparison operation on strings is also implementation
% dependent. The current implementation performs string comparison using
%
% - C's strcmp() function, when compiling to C;
% - Java's String.compareTo() method, when compiling to Java; and
% - C#'s System.String.CompareOrdinal() method, when compiling to C#.
%
%--------------------------------------------------%
%
% This module is divided into several sections. These sections are:
%
% - Wrapper types that associate particular semantics with raw strings.
% - Converting between strings and lists of characters.
% - Reading characters from strings.
% - Writing characters to strings.
% - Determining the lengths of strings.
% - Computing hashes of strings.
% - Tests on strings.
% - Appending strings.
% - Splitting up strings.
% - Dealing with prefixes and suffixes.
% - Transformations of strings.
% - Folds over the characters in strings.
% - Formatting tables.
% - Converting strings to docs.
% - Converting strings to values of builtin types.
% - Converting values of builtin types to strings.
% - Converting values of arbitrary types to strings.
% - Converting values to strings based on a format string.
%
%--------------------------------------------------%

:- module string.
:- interface.

:- include_module builder.

:- import_module assoc_list.
:- import_module char.
:- import_module deconstruct.
:- import_module list.
:- import_module maybe.
:- import_module ops.
:- import_module pretty_printer.

%--------------------------------------------------%
%
% Wrapper types that associate particular semantics with raw strings.
%
% These types are used for defining stream typeclass instances
% where you want different instances for strings representing different
% semantic entities. Using the string type itself, without a wrapper,
% would be ambiguous in such situations.
%
% While each module that associates semantics with strings could define
% its own wrapper types, the notions of lines and text files are so common
% that it is simpler to define them just once, and this is the logical
% place to do that.
%

    % A line is:
    %
    % - a possibly empty sequence of non-newline characters terminated by a
    %   newline character; or
    % - a non-empty sequence of non-newline characters terminated by the end
    %   of the file.
    %
:- type line
    --->    line(string).

    % A text file is a possibly empty sequence of characters
    % terminated by the end of the file.
    %
:- type text_file
    --->    text_file(string).

%--------------------------------------------------%
%
% Conversions between strings and lists of characters.
%

    % Convert the string to a list of characters (code points).
    %
    % In the forward mode:
    % If strings use UTF-8 encoding, then each code unit in an ill-formed
    % sequence is replaced by U+FFFD REPLACEMENT CHARACTER in the list.
    % If strings use UTF-16 encoding then each unpaired surrogate code point
    % is returned as a separate code point in the list.
    %
    % The reverse mode of the predicate throws an exception if the list
    % contains a null character or code point that cannot be encoded in a
    % string (namely, surrogate code points cannot be encoded in UTF-8
    % strings).
    %
    % The reverse mode of to_char_list/2 is deprecated because the implied
    % ability to round trip convert a string to a list then back to the same
    % string does not hold in the presence of ill-formed code unit sequences.
    %
:- pragma obsolete_proc(to_char_list(uo, in), [from_char_list/2]).
:- func to_char_list(string) = list(char).
:- pred to_char_list(string, list(char)).
:- mode to_char_list(in, out) is det.
:- mode to_char_list(uo, in) is det.

    % Convert the string to a list of characters (code points) in reverse
    % order.
    %
    % In the forward mode:
    % If strings use UTF-8 encoding, then each code unit in an ill-formed
    % sequence is replaced by U+FFFD REPLACEMENT CHARACTER in the list.
    % If strings use UTF-16 encoding, then each unpaired surrogate code point
    % is returned as a separate code point in the list.
    %
    % The reverse mode of the predicate throws an exception if the list
    % contains a null character or code point that cannot be encoded in a
    % string (namely, surrogate code points cannot be encoded in UTF-8
    % strings).
    %
    % The reverse mode of to_rev_char_list/2 is deprecated because the implied
    % ability to round trip convert a string to a list then back to the same
    % string does not hold in the presence of ill-formed code unit sequences.
    %
:- pragma obsolete_proc(to_rev_char_list(uo, in), [from_rev_char_list/2]).
:- func to_rev_char_list(string) = list(char).
:- pred to_rev_char_list(string, list(char)).
:- mode to_rev_char_list(in, out) is det.
:- mode to_rev_char_list(uo, in) is det.

    % Convert a list of characters (code points) to a string.
    % Throws an exception if the list contains a null character or code point
    % that cannot be encoded in a string (namely, surrogate code points cannot
    % be encoded in UTF-8 strings).
    %
    % The reverse mode of from_char_list/2 is deprecated because the implied
    % ability to round trip convert a string to a list then back to the same
    % string does not hold in the presence of ill-formed code unit sequences.
    %
:- pragma obsolete_proc(from_char_list(out, in), [to_char_list/2]).
:- func from_char_list(list(char)::in) = (string::uo) is det.
:- pred from_char_list(list(char), string).
:- mode from_char_list(in, uo) is det.
:- mode from_char_list(out, in) is det.

    % As above, but fail instead of throwing an exception if the list contains
    % a null character or code point that cannot be encoded in a string.
    %
:- pred semidet_from_char_list(list(char)::in, string::uo) is semidet.

    % Same as from_char_list, except that it reverses the order
    % of the characters.
    % Throws an exception if the list contains a null character or code point
    % that cannot be encoded in a string (namely, surrogate code points cannot
    % be encoded in UTF-8 strings).
    %
:- func from_rev_char_list(list(char)::in) = (string::uo) is det.
:- pred from_rev_char_list(list(char)::in, string::uo) is det.

    % As above, but fail instead of throwing an exception if the list contains
    % a null character or code point that cannot be encoded in a string.
    %
:- pred semidet_from_rev_char_list(list(char)::in, string::uo) is semidet.

    % Convert a string into a list of code units of the string encoding used
    % by the current process.
    %
:- pred to_code_unit_list(string::in, list(int)::out) is det.

    % Convert a string into a list of UTF-8 code units.
    % Throws an exception if the string contains an unpaired surrogate code
    % point, as the encoding of surrogate code points is prohibited in UTF-8.
    %
:- pred to_utf8_code_unit_list(string::in, list(int)::out) is det.

    % Convert a string into a list of UTF-16 code units.
    % Throws an exception if strings use UTF-8 encoding and the given string
    % contains an ill-formed code unit sequence, as arbitrary bytes cannot be
    % represented in UTF-16 (even allowing for ill-formed sequences).
    %
:- pred to_utf16_code_unit_list(string::in, list(int)::out) is det.

    % Convert a list of code units to a string.
    % Fails if the list does not contain a valid encoding of a string
    % (in the encoding expected by the current process),
    % or if the string would contain a null character.
    %
:- pred from_code_unit_list(list(int)::in, string::uo) is semidet.

    % Convert a list of code units to a string.
    % The resulting string may contain ill-formed sequences.
    % Fails if the list contains a code unit that is out of range
    % or if the string would contain a null character.
    %
:- pred from_code_unit_list_allow_ill_formed(list(int)::in, string::uo)
    is semidet.

    % Convert a list of UTF-8 code units to a string.
    % Fails if the list does not contain a valid encoding of a string
    % or if the string would contain a null character.
    %
:- pred from_utf8_code_unit_list(list(int)::in, string::uo) is semidet.

    % Convert a list of UTF-16 code units to a string.
    % Fails if the list does not contain a valid encoding of a string
    % or if the string would contain a null character.
    %
:- pred from_utf16_code_unit_list(list(int)::in, string::uo) is semidet.

    % duplicate_char(Char, Count, String):
    %
    % Construct a string consisting of Count occurrences of Char code points
    % in sequence, returning the empty string if Count is less than or equal
    % to zero. Throws an exception if Char is a null character or code point
    % that cannot be encoded in a string (namely, surrogate code points cannot
    % be encoded in UTF-8 strings).
    %
:- func duplicate_char(char::in, int::in) = (string::uo) is det.
:- pred duplicate_char(char::in, int::in, string::uo) is det.

%--------------------------------------------------%
%
% Reading characters from strings.
%

    % This type is used by the _repl indexing predicates to distinguish a
    % U+FFFD code point that is actually in a string from a U+FFFD code point
    % generated when the predicate encounters an ill-formed code unit sequence
    % in a UTF-8 string.
    %
:- type maybe_replaced
    --->    not_replaced
    ;       replaced_code_unit(uint8).

    % index(String, Index, Char):
    %
    % If `Index' is the initial code unit offset of a well-formed code unit
    % sequence in `String' then `Char' is the code point encoded by that
    % sequence.
    %
    % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
    % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
    % code point at `Index' (when strings are UTF-16 encoded).
    %
    % Fails if `Index' is out of range (negative, or greater than or equal to
    % the length of `String').
    %
:- pred index(string::in, int::in, char::uo) is semidet.

    % det_index(String, Index, Char):
    %
    % Like index/3 but throws an exception if `Index' is out of range
    % (negative, or greater than or equal to the length of `String').
    %
:- func det_index(string, int) = char.
:- pred det_index(string::in, int::in, char::uo) is det.

    % unsafe_index(String, Index, Char):
    %
    % Like index/3 but does not check that `Index' is in range.
    %
    % WARNING: behavior is UNDEFINED if `Index' is out of range
    % (negative, or greater than or equal to the length of `String').
    % This version is constant time, whereas det_index
    % may be linear in the length of the string. Use with care!
    %
:- func unsafe_index(string, int) = char.
:- pred unsafe_index(string::in, int::in, char::uo) is det.

    % A synonym for det_index/2:
    % String ^ elem(Index) = det_index(String, Index).
    %
:- func string ^ elem(int) = char.

    % A synonym for unsafe_index/2:
    % String ^ unsafe_elem(Index) = unsafe_index(String, Index).
    %
:- func string ^ unsafe_elem(int) = char.

    % index_next(String, Index, NextIndex, Char):
    %
    % If `Index' is the initial code unit offset of a well-formed code unit
    % sequence in `String' then `Char' is the code point encoded by that
    % sequence, and `NextIndex' is the offset immediately following that
    % sequence.
    %
    % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
    % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
    % code point at `Index' (when strings are UTF-16 encoded), and `NextIndex'
    % is Index + 1.
    %
    % Fails if `Index' is out of range (negative, or greater than or equal to
    % the length of `String').
    %
:- pred index_next(string::in, int::in, int::out, char::uo) is semidet.

    % index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
    %
    % Like index_next/4 but also returns MaybeReplaced on success.
    % When Char is not U+FFFD, then MaybeReplaced is always `not_replaced'.
    % When Char is U+FFFD (the Unicode replacement character), then there are
    % two cases:
    %
    % - If there is a U+FFFD code point encoded in String at
    %   [Index, NextIndex) then MaybeReplaced is `not_replaced'.
    %
    % - Otherwise, MaybeReplaced is `replaced_code_unit(CodeUnit)' where
    %   CodeUnit is the code unit in String at Index.
    %
:- pred index_next_repl(string::in, int::in, int::out, char::uo,
    maybe_replaced::out) is semidet.

    % unsafe_index_next(String, Index, NextIndex, Char):
    %
    % Like index_next/4 but does not check that `Index' is in range.
    % Fails if `Index' is equal to the length of `String'.
    %
    % WARNING: behavior is UNDEFINED if `Index' is out of range
    % (negative, or greater than the length of `String').
    %
:- pred unsafe_index_next(string::in, int::in, int::out, char::uo) is semidet.

    % unsafe_index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
    %
    % Like index_next_repl/5 but does not check that `Index' is in range.
    % Fails if `Index' is equal to the length of `String'.
    %
    % WARNING: behavior is UNDEFINED if `Index' is out of range
    % (negative, or greater than the length of `String').
    %
:- pred unsafe_index_next_repl(string::in, int::in, int::out, char::uo,
    maybe_replaced::out) is semidet.

    % prev_index(String, Index, PrevIndex, Char):
    %
    % If `Index - 1' is the final code unit offset of a well-formed sequence in
    % `String' then `Char' is the code point encoded by that sequence, and
    % `PrevIndex' is the initial code unit offset of that sequence.
    %
    % Otherwise, if `Index' is in range, `Char' is either a U+FFFD REPLACEMENT
    % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
    % code point at `Index - 1' (when strings are UTF-16 encoded), and
    % `PrevIndex' is `Index - 1'.
    %
    % Fails if `Index' is out of range (non-positive, or greater than the
    % length of `String').
    %
:- pred prev_index(string::in, int::in, int::out, char::uo) is semidet.

    % prev_index_repl(String, Index, PrevIndex, Char, MaybeReplaced):
    %
    % Like prev_index/4 but also returns MaybeReplaced on success.
    % When Char is not U+FFFD, then MaybeReplaced is always `not_replaced'.
    % When Char is U+FFFD (the Unicode replacement character), then there are
    % two cases:
    %
    % - If there is a U+FFFD code point encoded in String at
    %   [PrevIndex, Index) then MaybeReplaced is `not_replaced'.
    %
    % - Otherwise, MaybeReplaced is `replaced_code_unit(CodeUnit)' where
    %   CodeUnit is the code unit in String at Index - 1.
    %
:- pred prev_index_repl(string::in, int::in, int::out, char::uo,
    maybe_replaced::out) is semidet.

    % unsafe_prev_index(String, Index, PrevIndex, Char):
    %
    % Like prev_index/4 but does not check that `Index' is in range.
    % Fails if `Index' is zero.
    %
    % WARNING: behavior is UNDEFINED if `Index' is out of range
    % (negative, or greater than the length of `String').
    %
:- pred unsafe_prev_index(string::in, int::in, int::out, char::uo) is semidet.

    % unsafe_prev_index_repl(String, Index, PrevIndex, Char, MaybeReplaced):
    %
    % Like prev_index_repl/5 but does not check that `Index' is in range.
    % Fails if `Index' is zero.
    %
    % WARNING: behavior is UNDEFINED if `Index' is out of range
    % (negative, or greater than the length of `String').
    %
:- pred unsafe_prev_index_repl(string::in, int::in, int::out, char::uo,
    maybe_replaced::out) is semidet.

    % unsafe_index_code_unit(String, Index, CodeUnit):
    %
    % `CodeUnit' is the code unit in `String' at the offset `Index'.
    % WARNING: behavior is UNDEFINED if `Index' is out of range
    % (negative, or greater than or equal to the length of `String').
    %
:- pred unsafe_index_code_unit(string::in, int::in, int::out) is det.

%--------------------------------------------------%
%
% Writing characters to strings.
%

    % set_char(Char, Index, String0, String):
    %
    % `String' is `String0', with the code unit sequence beginning at `Index'
    % replaced by the encoding of `Char'. If the code unit at `Index' is the
    % initial code unit in a valid encoding of a code point, then that entire
    % code unit sequence is replaced. Otherwise, only the code unit at `Index'
    % is replaced.
    %
    % Fails if `Index' is out of range (negative, or greater than or equal to
    % the length of `String0').
    %
    % Throws an exception if `Char' is the null character or a code point that
    % cannot be encoded in a string (namely, surrogate code points cannot be
    % encoded in UTF-8 strings).
    %
:- pred set_char(char, int, string, string).
:- mode set_char(in, in, in, out) is semidet.
% NOTE This mode is disabled because the compiler puts constant strings
% into static data even when they might be updated.
% :- mode set_char(in, in, di, uo) is semidet.

    % det_set_char(Char, Index, String0, String):
    %
    % Same as set_char/4 but throws an exception if `Index' is out of range
    % (negative, or greater than or equal to the length of `String0').
    %
:- func det_set_char(char, int, string) = string.
:- pred det_set_char(char, int, string, string).
:- mode det_set_char(in, in, in, out) is det.
% NOTE This mode is disabled because the compiler puts constant strings
% into static data even when they might be updated.
% :- mode det_set_char(in, in, di, uo) is det.

    % unsafe_set_char(Char, Index, String0, String):
    %
    % Same as set_char/4 but does not check if `Index' is in range.
    % WARNING: behavior is UNDEFINED if `Index' is out of range
    % (negative, or greater than or equal to the length of `String0').
    % Use with care!
    %
:- func unsafe_set_char(char, int, string) = string.
:- mode unsafe_set_char(in, in, in) = out is det.
% NOTE This mode is disabled because the compiler puts constant strings
% into static data even when they might be updated.
% :- mode unsafe_set_char(in, in, di) = uo is det.
:- pred unsafe_set_char(char, int, string, string).
:- mode unsafe_set_char(in, in, in, out) is det.
% NOTE This mode is disabled because the compiler puts constant strings
% into static data even when they might be updated.
% :- mode unsafe_set_char(in, in, di, uo) is det.

%--------------------------------------------------%
%
% Determining the lengths of strings.
%

    % Determine the length of a string, in code units.
    % An empty string has length zero.
    %
    % NOTE: code points (characters) are encoded using one or more code units,
    % i.e. bytes for UTF-8; 16-bit integers for UTF-16.
    %
:- func length(string::in) = (int::uo) is det.
:- pred length(string, int).
:- mode length(in, uo) is det.
:- mode length(ui, uo) is det.

    % Synonyms for length.
    %
:- func count_code_units(string) = int.
:- pred count_code_units(string::in, int::out) is det.

    % Determine the number of code points in a string.
    %
    % Each valid code point, and each code unit that is part of an ill-formed
    % sequence, contributes one to the result.
    % (This matches the number of steps it would take to iterate over the
    % string using string.index_next or string.prev_index.)
    %
    % NOTE The names of this predicate and several other predicates
    % may be changed in the future to refer to code_points, not codepoints,
    % for consistency with predicate names that talk about code_units.
    %
:- func count_codepoints(string) = int.
:- pred count_codepoints(string::in, int::out) is det.

    % count_utf8_code_units(String) = Length:
    %
    % Return the number of code units required to represent a string in
    % UTF-8 encoding (with allowance for ill-formed sequences).
    % Equivalent to `Length = length(to_utf8_code_unit_list(String))'.
    %
    % Throws an exception if strings use UTF-16 encoding but the given string
    % contains an unpaired surrogate code point. Surrogate code points cannot
    % be represented in UTF-8.
    %
:- func count_utf8_code_units(string) = int.

    % codepoint_offset(String, StartOffset, Count, Offset):
    %
    % Let `S' be the substring of `String' from code unit `StartOffset' to the
    % end of the string. `Offset' is code unit offset after advancing `Count'
    % steps in `S', where each step skips over either:
    %  - one encoding of a Unicode code point, or
    %  - one code unit that is part of an ill-formed sequence.
    %
    % Fails if `StartOffset' is out of range (negative, or greater than the
    % length of `String'), or if there are fewer than `Count' steps possible
    % in `S'.
    %
:- pred codepoint_offset(string::in, int::in, int::in, int::out) is semidet.

    % codepoint_offset(String, Count, Offset):
    %
    % Same as `codepoint_offset(String, 0, Count, Offset)'.
    %
:- pred codepoint_offset(string::in, int::in, int::out) is semidet.

%--------------------------------------------------%
%
% Computing hashes of strings.
%

    % Compute a hash value for a string.
    %
:- func hash(string) = int.
:- pred hash(string::in, int::out) is det.

    % Two other hash functions for strings.
    %
:- func hash2(string) = int.
:- func hash3(string) = int.

    % Cross-compilation-friendly versions of hash, hash2 and hash3
    % respectively.
:- func hash4(string) = int.
:- func hash5(string) = int.
:- func hash6(string) = int.

%--------------------------------------------------%
%
% Tests on strings.
%

    % True if string is the empty string.
    %
:- pred is_empty(string::in) is semidet.

    % True if the string is a valid UTF-8 or UTF-16 string.
    % In target languages that use UTF-8 string encoding, `is_well_formed(S)'
    % is true iff S consists of a well-formed UTF-8 code unit sequence.
    % In target languages that use UTF-16 string encoding, `is_well_formed(S)'
    % is true iff S consists of a well-formed UTF-16 code unit sequence.
    %
:- pred is_well_formed(string::in) is semidet.

    % True if string contains only alphabetic characters [A-Za-z].
    %
:- pred is_all_alpha(string::in) is semidet.

    % True if string contains only alphabetic characters [A-Za-z] and digits
    % [0-9].
    %
:- pred is_all_alnum(string::in) is semidet.

    % True if string contains only alphabetic characters [A-Za-z] and
    % underscores.
    %
:- pred is_all_alpha_or_underscore(string::in) is semidet.

    % True if string contains only alphabetic characters [A-Za-z],
    % digits [0-9], and underscores.
    %
:- pred is_all_alnum_or_underscore(string::in) is semidet.

    % True if the string contains only decimal digits (0-9).
    %
:- pred is_all_digits(string::in) is semidet.

    % all_match(TestPred, String):
    %
    % True iff all code points in String satisfy TestPred, and String contains
    % no ill-formed code unit sequences.
    %
:- pred all_match(pred(char)::in(pred(in) is semidet), string::in) is semidet.

    % contains_char(String, Char):
    %
    % Succeed if the code point Char occurs in String.
    % Any ill-formed code unit sequences within String are ignored
    % as they will not contain Char.
    %
:- pred contains_char(string::in, char::in) is semidet.

    % compare_substrings(Res, X, StartX, Y, StartY, Length):
    %
    % Compare two substrings by code unit order. The two substrings are
    % the substring of `X' between `StartX' and `StartX + Length', and
    % the substring of `Y' between `StartY' and `StartY + Length'.
    % `StartX', `StartY' and `Length' are all in terms of code units.
    %
    % Fails if `StartX' or `StartX + Length' are not within [0, length(X)],
    % or if `StartY' or `StartY + Length' are not within [0, length(Y)],
    % or if `Length' is negative.
    %
:- pred compare_substrings(comparison_result::uo, string::in, int::in,
    string::in, int::in, int::in) is semidet.

    % unsafe_compare_substrings(Res, X, StartX, Y, StartY, Length):
    %
    % Same as compare_between/4 but without range checks.
    % WARNING: if any of `StartX', `StartY', `StartX + Length' or
    % `StartY + Length' are out of range, or if `Length' is negative,
    % then the behaviour is UNDEFINED. Use with care!
    %
:- pred unsafe_compare_substrings(comparison_result::uo, string::in, int::in,
    string::in, int::in, int::in) is det.

    % compare_ignore_case_ascii(Res, X, Y):
    %
    % Compare two strings by code unit order, ignoring the case of letters
    % (A-Z, a-z) in the ASCII range.
    % Equivalent to `compare(Res, to_lower(X), to_lower(Y))'
    % but more efficient.
    %
:- pred compare_ignore_case_ascii(comparison_result::uo,
    string::in, string::in) is det.

    % prefix_length(Pred, String):
    %
    % The length (in code units) of the maximal prefix of `String' consisting
    % entirely of code points satisfying `Pred'.
    %
:- func prefix_length(pred(char)::in(pred(in) is semidet), string::in)
    = (int::out) is det.

    % suffix_length(Pred, String):
    %
    % The length (in code units) of the maximal suffix of `String' consisting
    % entirely of code points satisfying `Pred'.
    %
:- func suffix_length(pred(char)::in(pred(in) is semidet), string::in)
    = (int::out) is det.

    % sub_string_search(String, SubString, Index):
    %
    % `Index' is the code unit position in `String' where the first
    % occurrence of `SubString' begins. Indices start at zero, so if
    % `SubString' is a prefix of `String', this will return Index = 0.
    %
:- pred sub_string_search(string::in, string::in, int::out) is semidet.

    % sub_string_search_start(String, SubString, BeginAt, Index):
    %
    % `Index' is the code unit position in `String' where the first
    % occurrence of `SubString' occurs such that 'Index' is greater than or
    % equal to `BeginAt'. Indices start at zero.
    % Fails if either `BeginAt' is negative, or greater than
    % length(String) - length(SubString).
    %
:- pred sub_string_search_start(string::in, string::in, int::in, int::out)
    is semidet.

    % unsafe_sub_string_search_start(String, SubString, BeginAt, Index):
    %
    % Same as sub_string_search_start/4 but does not check that `BeginAt'
    % is in range.
    % WARNING: if `BeginAt' is either negative, or greater than length(String),
    % then the behaviour is UNDEFINED. Use with care!
    %
:- pred unsafe_sub_string_search_start(string::in, string::in, int::in,
    int::out) is semidet.

%--------------------------------------------------%
%
% Appending strings.
%

    % Append two strings together.
    %
:- func append(string::in, string::in) = (string::uo) is det.

    % append(S1, S2, S3):
    %
    % Append two strings together. S3 consists of the code units of S1
    % followed by the code units of S2, in order.
    %
    % An ill-formed code unit sequence at the end of S1 may join with an
    % ill-formed code unit sequence at the start of S2 to produce a valid
    % encoding of a code point in S3.
    %
    % The append(out, out, in) mode is deprecated because it does not match
    % the semantics of the forwards modes in the presence of ill-formed code
    % unit sequences. Use nondet_append/3 instead.
    %
:- pragma obsolete_proc(append(out, out, in), [nondet_append/3]).
:- pred append(string, string, string).
:- mode append(in, in, in) is semidet.  % implied
:- mode append(in, uo, in) is semidet.
:- mode append(in, in, uo) is det.
:- mode append(uo, in, in) is semidet.
:- mode append(out, out, in) is multi.

    % nondet_append(S1, S2, S3):
    %
    % Non-deterministically return S1 and S2, where S1 ++ S2 = S3.
    % S3 is split after each code point or code unit in an ill-formed sequence.
    %
:- pred nondet_append(string, string, string).
:- mode nondet_append(out, out, in) is multi.

    % S1 ++ S2 = S :- append(S1, S2, S).
    %
    % Append two strings together using nicer inline syntax.
    %
:- func string ++ string = string.
:- mode in ++ in = uo is det.

    % Append a list of strings together.
    %
:- func append_list(list(string)::in) = (string::uo) is det.
:- pred append_list(list(string)::in, string::uo) is det.

    % join_list(Separator, Strings) = JoinedString:
    %
    % Append together the strings in Strings, putting Separator between
    % each pair of adjacent strings. If Strings is the empty list,
    % return the empty string.
    %
:- func join_list(string::in, list(string)::in) = (string::uo) is det.

%--------------------------------------------------%
%
% Making strings from smaller pieces.
%

:- type string_piece
    --->    string(string)
    ;       substring(string, int, int).    % string, start, end offset

    % append_string_pieces(Pieces, String):
    %
    % Append together the strings and substrings in `Pieces' into a string.
    % Throws an exception if `Pieces' contains an element
    % `substring(S, Start, End)' where `Start' or `End' are not within
    % the range [0, length(S)], or if `Start' > `End'.
    %
:- pred append_string_pieces(list(string_piece)::in, string::uo) is det.

    % Same as append_string_pieces/2 but without range checks.
    % WARNING: if any piece `substring(S, Start, End)' has `Start' or `End'
    % outside the range [0, length(S)], or if `Start' > `End',
    % then the behaviour is UNDEFINED. Use with care!
    %
:- pred unsafe_append_string_pieces(list(string_piece)::in, string::uo)
    is det.

%--------------------------------------------------%
%
% Splitting up strings.
%

    % first_char(String, Char, Rest) is true iff `String' begins with a
    % well-formed code unit sequence, `Char' is the code point encoded by
    % that sequence, and `Rest' is the rest of `String' after that sequence.
    %
    % The (uo, in, in) mode throws an exception if `Char' cannot be encoded in
    % a string, or if `Char' is a surrogate code point (for consistency with
    % the other modes).
    %
    % WARNING: first_char makes a copy of Rest because the garbage collector
    % doesn't handle references into the middle of an object, at least not the
    % way we use it. This means that repeated use of first_char to iterate
    % over a string will result in very poor performance. If you want to
    % iterate over the characters in a string, use foldl or to_char_list
    % instead.
    %
:- pred first_char(string, char, string).
:- mode first_char(in, in, in) is semidet.  % implied
:- mode first_char(in, uo, in) is semidet.  % implied
:- mode first_char(in, in, uo) is semidet.  % implied
:- mode first_char(in, uo, uo) is semidet.
:- mode first_char(uo, in, in) is det.

    % split(String, Index, LeftSubstring, RightSubstring):
    %
    % Split a string into two substrings at the code unit offset `Index'.
    % (If `Index' is out of the range [0, length of `String'], it is treated
    % as if it were the nearest end-point of that range.)
    %
:- pred split(string::in, int::in, string::out, string::out) is det.

    % split_by_codepoint(String, Count, LeftSubstring, RightSubstring):
    %
    % `LeftSubstring' is the left-most `Count' code points of `String',
    % and `RightSubstring' is the remainder of `String'.
    % (If `Count' is out of the range [0, length of `String'], it is treated
    % as if it were the nearest end-point of that range.)
    %
:- pred split_by_codepoint(string::in, int::in, string::out, string::out)
    is det.

    % left(String, Count, LeftSubstring):
    %
    % `LeftSubstring' is the left-most `Count' code units of `String'.
    % (If `Count' is out of the range [0, length of `String'], it is treated
    % as if it were the nearest end-point of that range.)
    %
:- func left(string::in, int::in) = (string::out) is det.
:- pred left(string::in, int::in, string::out) is det.

    % left_by_codepoint(String, Count, LeftSubstring):
    %
    % `LeftSubstring' is the left-most `Count' code points of `String'.
    % (If `Count' is out of the range [0, length of `String'], it is treated
    % as if it were the nearest end-point of that range.)
    %
:- func left_by_codepoint(string::in, int::in) = (string::out) is det.
:- pred left_by_codepoint(string::in, int::in, string::out) is det.

    % right(String, Count, RightSubstring):
    %
    % `RightSubstring' is the right-most `Count' code units of `String'.
    % (If `Count' is out of the range [0, length of `String'], it is treated
    % as if it were the nearest end-point of that range.)
    %
:- func right(string::in, int::in) = (string::out) is det.
:- pred right(string::in, int::in, string::out) is det.

    % right_by_codepoint(String, Count, RightSubstring):
    %
    % `RightSubstring' is the right-most `Count' code points of `String'.
    % (If `Count' is out of the range [0, length of `String'], it is treated
    % as if it were the nearest end-point of that range.)
    %
:- func right_by_codepoint(string::in, int::in) = (string::out) is det.
:- pred right_by_codepoint(string::in, int::in, string::out) is det.

    % between(String, Start, End, Substring):
    %
    % `Substring' consists of the segment of `String' within the half-open
    % interval [Start, End), where `Start' and `End' are code unit offsets.
    % (If `Start' is out of the range [0, length of `String'], it is treated
    % as if it were the nearest end-point of that range.
    % If `End' is out of the range [`Start', length of `String'],
    % it is treated as if it were the nearest end-point of that range.)
    %
:- func between(string::in, int::in, int::in) = (string::uo) is det.
:- pred between(string::in, int::in, int::in, string::uo) is det.

    % between_codepoints(String, Start, End, Substring):
    %
    % `Substring' is the part of `String' between the code point positions
    % `Start' and `End'. The result is equivalent to:
    %
    %   between(String, StartOffset, EndOffset, Substring)
    %
    % where:
    %
    %   StartOffset is from codepoint_offset(String, Start, StartOffset)
    %     if Start is in [0, count_codepoints(String)],
    %   StartOffset = 0 if Start < 0,
    %   StartOffset = length(String) otherwise;
    %
    %   EndOffset is from codepoint_offset(String, End, EndOffset)
    %     if End is in [0, count_codepoints(String)],
    %   EndOffset = 0 if End < 0,
    %   EndOffset = length(String) otherwise.
    %
    % between/4 will enforce StartOffset =< EndOffset.
    %
:- func between_codepoints(string::in, int::in, int::in)
    = (string::uo) is det.
:- pred between_codepoints(string::in, int::in, int::in, string::uo) is det.

    % unsafe_between(String, Start, End, Substring):
    %
    % `Substring' consists of the segment of `String' within the half-open
    % interval [Start, End), where `Start' and `End' are code unit offsets.
    % WARNING: if `Start' is out of the range [0, length of `String'] or
    % `End' is out of the range [`Start', length of `String']
    % then the behaviour is UNDEFINED. Use with care!
    % This version takes time proportional to the length of the substring,
    % whereas substring may take time proportional to the length
    % of the whole string.
    %
:- func unsafe_between(string::in, int::in, int::in) = (string::uo) is det.
:- pred unsafe_between(string::in, int::in, int::in, string::uo) is det.

    % words_separator(SepP, String) returns the list of non-empty
    % substrings of String (in first to last order) that are delimited
    % by non-empty sequences of code points matched by SepP.
    % For example,
    %
    % words_separator(char.is_whitespace, " the cat  sat on the  mat") =
    %   ["the", "cat", "sat", "on", "the", "mat"]
    %
    % Note the difference to split_at_separator.
    %
:- func words_separator(pred(char), string) = list(string).
:- mode words_separator(pred(in) is semidet, in) = out is det.

    % words(String) =
    %   words_separator(char.is_whitespace, String).
    %
:- func words(string) = list(string).

    % split_at_separator(SepP, String) returns the list of (possibly empty)
    % substrings of String (in first to last order) that are delimited
    % by code points matched by SepP. For example,
    %
    % split_at_separator(char.is_whitespace, " a cat  sat on the  mat")
    %   = ["", "a", "cat", "", "sat", "on", "the", "", "mat"]
    %
    % Note the difference to words_separator.
    %
:- func split_at_separator(pred(char), string) = list(string).
:- mode split_at_separator(pred(in) is semidet, in) = out is det.

    % split_at_char(Char, String) =
    %     split_at_separator(unify(Char), String)
    %
:- func split_at_char(char, string) = list(string).

    % split_at_string(Separator, String) returns the list of substrings
    % of String that are delimited by Separator. For example,
    %
    % split_at_string("|||", "|||fld2|||fld3") = ["", "fld2", [fld3"]
    %
    % Always the first match of Separator is used to break the String, for
    % example: split_at_string("aa", "xaaayaaaz") = ["x", "ay", "az"]
    %
:- func split_at_string(string, string) = list(string).

    % split_into_lines(String) breaks String into a sequence of lines,
    % with each line consisting of a possibly empty sequence of non-newline
    % characters, followed either by a newline character, or by the end
    % of the string. The string returned for a line will not contain
    % the newline character.
    %
:- func split_into_lines(string) = list(string).

%--------------------------------------------------%
%
% Dealing with prefixes and suffixes.
%

    % prefix(String, Prefix) is true iff Prefix is a prefix of String.
    % Same as append(Prefix, _, String).
    %
:- pragma obsolete_proc(prefix(in, out)).
:- pred prefix(string, string).
:- mode prefix(in, in) is semidet.
:- mode prefix(in, out) is multi.

    % suffix(String, Suffix) is true iff Suffix is a suffix of String.
    % Same as append(_, Suffix, String).
    %
:- pragma obsolete_proc(suffix(in, out)).
:- pred suffix(string, string).
:- mode suffix(in, in) is semidet.
:- mode suffix(in, out) is multi.

    % remove_prefix(Prefix, String, Suffix):
    %
    % This is a synonym for append(Prefix, Suffix, String) but with the
    % arguments in a more convenient order for use with higher-order code.
    %
    % WARNING: the argument order differs from remove_suffix.
    %
:- pred remove_prefix(string::in, string::in, string::out) is semidet.

    % det_remove_prefix(Prefix, String, Suffix):
    %
    % This is a synonym for append(Prefix, Suffix, String) but with the
    % arguments in a more convenient order for use with higher-order code.
    %
    % WARNING: the argument order differs from remove_suffix.
    %
:- pred det_remove_prefix(string::in, string::in, string::out) is det.

    % remove_prefix_if_present(Prefix, String) = Suffix returns `String' minus
    % `Prefix' if `String' begins with `Prefix', and `String' if it doesn't.
    %
:- func remove_prefix_if_present(string, string) = string.

    % remove_suffix(String, Suffix, Prefix):
    %
    % The same as append(Prefix, Suffix, String).
    %
    % WARNING: the argument order differs from both remove_prefix and
    % remove_suffix_if_present.
    %
:- pred remove_suffix(string::in, string::in, string::out) is semidet.

    % det_remove_suffix(String, Suffix) returns the same value as
    % remove_suffix, except it throws an exception if String does not end
    % with Suffix.
    %
    % WARNING: the argument order differs from both remove_prefix and
    % remove_suffix_if_present.
    %
:- func det_remove_suffix(string, string) = string.

    % remove_suffix_if_present(Suffix, String) returns `String' minus `Suffix'
    % if `String' ends with `Suffix', and `String' if it doesn't.
    %
    % WARNING: the argument order differs from remove_suffix and
    % det_remove_suffix.
    %
:- func remove_suffix_if_present(string, string) = string.

    % add_suffix(Suffix, Str) = StrSuffix:
    %
    % Does the same job as Str ++ Suffix = StrSuffix, but allows
    % using list.map to add the same suffix to many strings.
    %
:- func add_suffix(string, string) = string.

%--------------------------------------------------%
%
% Transformations of strings.
%

    % Convert the first character (if any) of a string to uppercase.
    % Only letters (a-z) in the ASCII range are converted.
    %
    % This function transforms the initial code point of a string,
    % whether or not the code point occurs as part of a combining sequence.
    %
:- func capitalize_first(string) = string.
:- pred capitalize_first(string::in, string::out) is det.

    % Convert the first character (if any) of a string to lowercase.
    % Only letters (A-Z) in the ASCII range are converted.
    %
    % This function transforms the initial code point of a string,
    % whether or not the code point occurs as part of a combining sequence.
    %
:- func uncapitalize_first(string) = string.
:- pred uncapitalize_first(string::in, string::out) is det.

    % Converts a string to uppercase.
    % Only letters (A-Z) in the ASCII range are converted.
    %
    % This function transforms each code point individually.
    % Letters that occur within a combining sequence will be converted,
    % whereas the precomposed character equivalent to the combining
    % sequence would not be converted. For example:
    %
    %   to_upper("a\u0301") ==> "A\u0301"   % á decomposed
    %   to_upper("\u00E1")  ==> "\u00E1"    % á precomposed
    %
:- func to_upper(string::in) = (string::uo) is det.
:- pred to_upper(string, string).
:- mode to_upper(in, uo) is det.
:- mode to_upper(in, in) is semidet.        % implied

    % Converts a string to lowercase.
    % Only letters (a-z) in the ASCII range are converted.
    %
    % This function transforms each code point individually.
    % Letters that occur within a combining sequence will be converted,
    % whereas the precomposed character equivalent to the combining
    % sequence would not be converted. For example:
    %
    %   to_lower("A\u0301") ==> "a\u0301"   % Á decomposed
    %   to_lower("\u00C1")  ==> "\u00C1"    % Á precomposed
    %
:- func to_lower(string::in) = (string::uo) is det.
:- pred to_lower(string, string).
:- mode to_lower(in, uo) is det.
:- mode to_lower(in, in) is semidet.        % implied

    % pad_left(String0, PadChar, Width, String):
    %
    % Insert `PadChar's at the left of `String0' until it is at least as long
    % as `Width', giving `String'. Width is currently measured as the number
    % of code points.
    %
:- func pad_left(string, char, int) = string.
:- pred pad_left(string::in, char::in, int::in, string::out) is det.

    % pad_right(String0, PadChar, Width, String):
    %
    % Insert `PadChar's at the right of `String0' until it is at least as long
    % as `Width', giving `String'. Width is currently measured as the number
    % of code points.
    %
:- func pad_right(string, char, int) = string.
:- pred pad_right(string::in, char::in, int::in, string::out) is det.

    % chomp(String):
    %
    % Return `String' minus any single trailing newline character.
    %
:- func chomp(string) = string.

    % strip(String):
    %
    % Returns `String' minus any initial and trailing ASCII whitespace
    % characters, i.e. characters satisfying `char.is_whitespace'.
    %
:- func strip(string) = string.

    % lstrip(String):
    %
    % Return `String' minus any initial ASCII whitespace characters,
    % i.e. characters satisfying `char.is_whitespace'.
    %
:- func lstrip(string) = string.

    % rstrip(String):
    %
    % Returns `String' minus any trailing ASCII whitespace characters,
    % i.e. characters satisfying `char.is_whitespace'.
    %
:- func rstrip(string) = string.

    % lstrip_pred(Pred, String):
    %
    % Returns `String' minus the maximal prefix consisting entirely
    % of code points satisfying `Pred'.
    %
:- func lstrip_pred(pred(char)::in(pred(in) is semidet), string::in)
    = (string::out) is det.

    % rstrip_pred(Pred, String):
    %
    % Returns `String' minus the maximal suffix consisting entirely
    % of code points satisfying `Pred'.
    %
:- func rstrip_pred(pred(char)::in(pred(in) is semidet), string::in)
    = (string::out) is det.

    % replace(String0, Pattern, Subst, String):
    %
    % Replaces the first occurrence of Pattern in String0 with Subst to give
    % String. Fails if Pattern does not occur in String0.
    %
:- pred replace(string::in, string::in, string::in, string::uo) is semidet.

    % replace_all(String0, Pattern, Subst, String):
    %
    % Replaces any occurrences of Pattern in String0 with Subst to give
    % String.
    %
    % If Pattern is the empty string then Subst is inserted at every point
    % in String0 except between two code units in an encoding of a code point.
    % For example, these are true:
    %
    %   replace_all("", "", "|", "|")
    %   replace_all("a", "", "|", "|a|")
    %   replace_all("ab", "", "|", "|a|b|")
    %
:- func replace_all(string::in, string::in, string::in) = (string::uo) is det.
:- pred replace_all(string::in, string::in, string::in, string::uo) is det.

    % word_wrap(Str, N) = Wrapped:
    %
    % Wrapped is Str with newlines inserted between words (separated by ASCII
    % space characters) so that at most N code points appear on any line,
    % and each line contains as many whole words as possible subject to that
    % constraint. If any one word exceeds N code points in length, then
    % it will be broken over two (or more) lines. Sequences of whitespace
    % characters are replaced by a single space.
    %
    % See `char.is_whitespace' for the definition of whitespace characters
    % used by this predicate.
    %
:- func word_wrap(string, int) = string.

    % word_wrap_separator(Str, N, WordSeparator) = Wrapped:
    %
    % word_wrap_separator/3 is like word_wrap/2, except that words that
    % need to be broken up over multiple lines have WordSeparator inserted
    % between each piece. If the length of WordSeparator is greater than
    % or equal to N code points, then no separator is used.
    %
:- func word_wrap_separator(string, int, string) = string.

%--------------------------------------------------%
%
% Folds over the characters in strings.
%

    % foldl(Closure, String, !Acc):
    %
    % `Closure' is an accumulator predicate which is to be called for each
    % code point of the string `String' in turn.
    % If `String' contains ill-formed sequences, `Closure' is called for each
    % code unit in an ill-formed sequence. If strings use UTF-8 encoding,
    % U+FFFD is passed to `Closure' in place of each such code unit.
    % If strings use UTF-16 encoding, each code unit in an ill-formed sequence
    % is an unpaired surrogate code point, which will be passed to `Closure'.
    %
    % The initial value of the accumulator is `!.Acc' and the final value is
    % `!:Acc'.
    % (foldl(Closure, String, !Acc)  is equivalent to
    %   to_char_list(String, Chars),
    %   list.foldl(Closure, Chars, !Acc)
    % but is implemented more efficiently.)
    %
:- func foldl(func(char, A) = A, string, A) = A.
:- pred foldl(pred(char, A, A), string, A, A).
:- mode foldl(pred(in, di, uo) is det, in, di, uo) is det.
:- mode foldl(pred(in, in, out) is det, in, in, out) is det.
:- mode foldl(pred(in, in, out) is semidet, in, in, out) is semidet.
:- mode foldl(pred(in, in, out) is nondet, in, in, out) is nondet.
:- mode foldl(pred(in, in, out) is multi, in, in, out) is multi.

    % foldl2(Closure, String, !Acc1, !Acc2):
    % A variant of foldl with two accumulators.
    %
:- pred foldl2(pred(char, A, A, B, B), string, A, A, B, B).
:- mode foldl2(pred(in, di, uo, di, uo) is det,
    in, di, uo, di, uo) is det.
:- mode foldl2(pred(in, in, out, di, uo) is det,
    in, in, out, di, uo) is det.
:- mode foldl2(pred(in, in, out, in, out) is det,
    in, in, out, in, out) is det.
:- mode foldl2(pred(in, in, out, in, out) is semidet,
    in, in, out, in, out) is semidet.
:- mode foldl2(pred(in, in, out, in, out) is nondet,
    in, in, out, in, out) is nondet.
:- mode foldl2(pred(in, in, out, in, out) is multi,
    in, in, out, in, out) is multi.

    % foldl_between(Closure, String, Start, End, !Acc)
    % is equivalent to foldl(Closure, SubString, !Acc)
    % where SubString = between(String, Start, End).
    %
    % `Start' and `End' are in terms of code units.
    %
:- func foldl_between(func(char, A) = A, string, int, int, A) = A.
:- pred foldl_between(pred(char, A, A), string, int, int, A, A).
:- mode foldl_between(pred(in, in, out) is det, in, in, in,
    in, out) is det.
:- mode foldl_between(pred(in, di, uo) is det, in, in, in,
    di, uo) is det.
:- mode foldl_between(pred(in, in, out) is semidet, in, in, in,
    in, out) is semidet.
:- mode foldl_between(pred(in, in, out) is nondet, in, in, in,
    in, out) is nondet.
:- mode foldl_between(pred(in, in, out) is multi, in, in, in,
    in, out) is multi.

    % foldl2_between(Closure, String, Start, End, !Acc1, !Acc2)
    % A variant of foldl_between with two accumulators.
    %
    % `Start' and `End' are in terms of code units.
    %
:- pred foldl2_between(pred(char, A, A, B, B),
    string, int, int, A, A, B, B).
:- mode foldl2_between(pred(in, di, uo, di, uo) is det,
    in, in, in, di, uo, di, uo) is det.
:- mode foldl2_between(pred(in, in, out, di, uo) is det,
    in, in, in, in, out, di, uo) is det.
:- mode foldl2_between(pred(in, in, out, in, out) is det,
    in, in, in, in, out, in, out) is det.
:- mode foldl2_between(pred(in, in, out, in, out) is semidet,
    in, in, in, in, out, in, out) is semidet.
:- mode foldl2_between(pred(in, in, out, in, out) is nondet,
    in, in, in, in, out, in, out) is nondet.
:- mode foldl2_between(pred(in, in, out, in, out) is multi,
    in, in, in, in, out, in, out) is multi.

    % foldr(Closure, String, !Acc):
    % As foldl/4, except that processing proceeds right-to-left.
    %
:- func foldr(func(char, T) = T, string, T) = T.
:- pred foldr(pred(char, T, T), string, T, T).
:- mode foldr(pred(in, in, out) is det, in, in, out) is det.
:- mode foldr(pred(in, di, uo) is det, in, di, uo) is det.
:- mode foldr(pred(in, in, out) is semidet, in, in, out) is semidet.
:- mode foldr(pred(in, in, out) is nondet, in, in, out) is nondet.
:- mode foldr(pred(in, in, out) is multi, in, in, out) is multi.

    % foldr_between(Closure, String, Start, End, !Acc)
    % is equivalent to foldr(Closure, SubString, !Acc)
    % where SubString = between(String, Start, End).
    %
    % `Start' and `End' are in terms of code units.
    %
:- func foldr_between(func(char, T) = T, string, int, int, T) = T.
:- pred foldr_between(pred(char, T, T), string, int, int, T, T).
:- mode foldr_between(pred(in, in, out) is det, in, in, in,
    in, out) is det.
:- mode foldr_between(pred(in, di, uo) is det, in, in, in,
    di, uo) is det.
:- mode foldr_between(pred(in, in, out) is semidet, in, in, in,
    in, out) is semidet.
:- mode foldr_between(pred(in, in, out) is nondet, in, in, in,
    in, out) is nondet.
:- mode foldr_between(pred(in, in, out) is multi, in, in, in,
    in, out) is multi.

%--------------------------------------------------%
%
% Formatting tables.
%

:- type justified_column
    --->    left(list(string))
    ;       right(list(string)).

    % format_table(Columns, Separator) = Table:
    %
    % This function takes a list of columns and a column separator,
    % and returns a formatted table, where each field in each column
    % has been aligned and fields are separated with Separator.
    % There will be a newline character between each pair of rows.
    % Throws an exception if the columns are not all the same length.
    % Lengths are currently measured in terms of code points.
    %
    % For example:
    %
    % format_table([right(["a", "bb", "ccc"]), left(["1", "22", "333"])],
    %   " * ")
    % would return the table:
    %   a * 1
    %  bb * 22
    % ccc * 333
    %
:- func format_table(list(justified_column), string) = string.

    % format_table_max(Columns, Separator) does the same job as format_table,
    % but allows the caller to associate a maximum width with each column.
    %
:- func format_table_max(assoc_list(justified_column, maybe(int)), string)
    = string.

%--------------------------------------------------%
%
% Converting strings to docs.
%

    % Convert a string to a pretty_printer.doc for formatting.
    %
:- func string_to_doc(string) = pretty_printer.doc.

%--------------------------------------------------%
%
% Converting strings to values of builtin types.
%

    % Convert a string to an int. The string must contain only digits [0-9],
    % optionally preceded by a plus or minus sign. If the string does
    % not match this syntax or the number is not in the range
    % [min_int + 1, max_int], to_int fails.
    %
:- pred to_int(string::in, int::out) is semidet.

    % Convert a signed base 10 string to an int. Throws an exception if the
    % string argument does not match the regexp [+-]?[0-9]+ or the number is
    % not in the range [min_int + 1, max_int].
    %
:- func det_to_int(string) = int.

    % Convert a string in the specified base (2-36) to an int. The string
    % must contain one or more digits in the specified base, optionally
    % preceded by a plus or minus sign. For bases > 10, digits 10 to 35
    % are represented by the letters A-Z or a-z. If the string does not match
    % this syntax or the number is not in the range [min_int, max_int],
    % the predicate fails.
    %
:- pred base_string_to_int(int::in, string::in, int::out) is semidet.

    % Convert a signed base N string to an int. Throws an exception
    % if the string argument is not precisely an optional sign followed by
    % a non-empty string of base N digits, or if the number is not in
    % the range [min_int, max_int].
    %
:- func det_base_string_to_int(int, string) = int.

%--------------------------------------------------%

    % Convert a string to a uint. The string must contain only digits [0-9].
    % If the string does not match this syntax or the number is not
    % in the range [0, max_uint], to_uint fails.
    %
:- pred to_uint(string::in, uint::out) is semidet.

    % Convert a signed base 10 string to a uint. Throws an exception if the
    % string argument does not match the regexp [0-9]+ or the number is
    % not in the range [0, max_uint].
    %
:- func det_to_uint(string) = uint.

    % Convert a string in the specified base (2-36) to a uint. The string
    % must contain one or more digits in the specified base. For bases > 10,
    % digits 10 to 35 are represented by the letters A-Z or a-z. If the string
    % does not match this syntax or the number is not in the range
    % [0, max_uint], the predicate fails.
    %
:- pred base_string_to_uint(int::in, string::in, uint::out) is semidet.

    % Convert a signed base N string to a uint. Throws an exception
    % if the string argument is not precisely a non-empty string of base N
    % digits, or if the number is not in the range [0, max_uint].
    %
:- func det_base_string_to_uint(int, string) = uint.

%--------------------------------------------------%

    % Convert a string to a float, returning infinity or -infinity if the
    % conversion overflows. Fails if the string is not a syntactically correct
    % float literal.
    %
:- pred to_float(string::in, float::out) is semidet.

    % Convert a string to a float, returning infinity or -infinity if the
    % conversion overflows. Throws an exception if the string is not a
    % syntactically correct float literal.
    %
:- func det_to_float(string) = float.

%--------------------------------------------------%
%
% Converting values of builtin types to strings.
%

    % char_to_string(Char, String):
    %
    % Converts a character to a string, or vice versa.
    % True if `String' is the well-formed string that encodes the code point
    % `Char'; or, if strings are UTF-16 encoded, `Char' is a surrogate code
    % point and `String' is the string that contains only that surrogate code
    % point. Otherwise, `char_to_string(Char, String)' is false.
    %
    % Throws an exception if `Char' is the null character or a code point that
    % cannot be encoded in a string (namely, surrogate code points cannot be
    % encoded in UTF-8 strings).
    %
:- func char_to_string(char::in) = (string::uo) is det.
:- pred char_to_string(char, string).
:- mode char_to_string(in, uo) is det.
:- mode char_to_string(out, in) is semidet.

    % A synonym for char_to_string/1.
    %
:- func from_char(char::in) = (string::uo) is det.

    % Convert an integer to a string in base 10.
    % See int_to_base_string for the string format.
    %
:- func int_to_string(int::in) = (string::uo) is det.
:- pred int_to_string(int::in, string::uo) is det.

    % A synonym for int_to_string/1.
    %
:- func from_int(int::in) = (string::uo) is det.

    % int_to_base_string(Int, Base, String):
    %
    % Convert an integer to a string in a given Base.
    % `String' will consist of a minus sign (U+002D HYPHEN-MINUS)
    % if `Int' is negative, followed by one or more decimal digits (0-9)
    % or uppercase letters (A-Z). There will be no leading zeros.
    %
    % Base must be between 2 and 36, both inclusive; if it is not,
    % the predicate will throw an exception.
    %
:- func int_to_base_string(int::in, int::in) = (string::uo) is det.
:- pred int_to_base_string(int::in, int::in, string::uo) is det.

    % Convert an integer to a string in base 10 with commas as thousand
    % separators.
    %
:- func int_to_string_thousands(int::in) = (string::uo) is det.

    % int_to_base_string_group(Int, Base, GroupLength, Separator, String):
    %
    % Convert an integer to a string in a given Base,
    % in the same format as int_to_base_string,
    % with Separator inserted between every GroupLength digits
    % (grouping from the end of the string).
    % If GroupLength is less than one, no separators will appear
    % in the output. Useful for formatting numbers like "1,300,000".
    %
    % Base must be between 2 and 36, both inclusive; if it is not,
    % the predicate will throw an exception.
    %
:- func int_to_base_string_group(int, int, int, string) = string.
:- mode int_to_base_string_group(in, in, in, in) = uo is det.

    % Convert an unsigned integer to a string in base 10.
    %
:- func uint_to_string(uint::in) = (string::uo) is det.

    % Convert an unsigned integer to a string in base 16.
    % Alphabetic digits will be lowercase (e.g. a-f).
    %
:- func uint_to_hex_string(uint::in) = (string::uo) is det.
:- func uint_to_lc_hex_string(uint::in) = (string::uo) is det.

    % Convert an unsigned integer to a string in base 16.
    % Alphabetic digits will be uppercase (e.g. A-F).
    %
:- func uint_to_uc_hex_string(uint::in) = (string::uo) is det.

    % Convert an unsigned integer to a string in base 8.
    %
:- func uint_to_octal_string(uint::in) = (string::uo) is det.

    % Convert a signed/unsigned 8/16/32/64 bit integer to a string.
    %
:- func int8_to_string(int8::in) = (string::uo) is det.
:- func uint8_to_string(uint8::in) = (string::uo) is det.
:- func int16_to_string(int16::in) = (string::uo) is det.
:- func uint16_to_string(uint16::in) = (string::uo) is det.
:- func int32_to_string(int32::in) = (string::uo) is det.
:- func uint32_to_string(uint32::in) = (string::uo) is det.
:- func int64_to_string(int64::in) = (string::uo) is det.
:- func uint64_to_string(uint64::in) = (string::uo) is det.

    % Convert an unsigned 64-bit integer to a string in base 16.
    % Alphabetic digits will be lowercase (e.g. a-f).
    %
:- func uint64_to_hex_string(uint64::in) = (string::uo) is det.
:- func uint64_to_lc_hex_string(uint64::in) = (string::uo) is det.

    % Convert an unsigned 64-bit integer to a string in base 16.
    % Alphabetic digits will be uppercase (e.g. A-F).
    %
:- func uint64_to_uc_hex_string(uint64::in) = (string::uo) is det.

    % Convert an unsigned 64-bit integer to a string in base 8.
    %
:- func uint64_to_octal_string(uint64::in) = (string::uo) is det.

    % Convert a float to a string.
    % In the current implementation, the resulting float will be in the form
    % that it was printed using the format string "%#.<prec>g".
    % <prec> will be in the range p to (p+2)
    % where p = floor(mantissa_digits * log2(base_radix) / log2(10)).
    % The precision chosen from this range will be such as to allow
    % a successful decimal -> binary conversion of the float.
    %
:- func float_to_string(float::in) = (string::uo) is det.
:- pred float_to_string(float::in, string::uo) is det.

    % A synonym for float_to_string/1.
    %
:- func from_float(float::in) = (string::uo) is det.

    % Convert a c_pointer to a string. The format is "c_pointer(0xXXXX)"
    % where XXXX is the hexadecimal representation of the pointer.
    %
:- func c_pointer_to_string(c_pointer::in) = (string::uo) is det.
:- pred c_pointer_to_string(c_pointer::in, string::uo) is det.

    % A synonym for c_pointer_to_string/1.
    %
:- func from_c_pointer(c_pointer::in) = (string::uo) is det.

%--------------------------------------------------%
%
% Converting values of arbitrary types to strings.
%

    % string(X): Returns a canonicalized string representation of the value X
    % using the standard Mercury operators.
    %
:- func string(T) = string.

    % As above, but using the supplied table of operators.
    %
:- func string_ops(ops.table, T) = string.

    % string_ops_noncanon(NonCanon, OpsTable, X, String)
    %
    % As above, but the caller specifies what behaviour should occur for
    % non-canonical terms (i.e. terms where multiple representations
    % may compare as equal):
    %
    % - `do_not_allow' will throw an exception if (any subterm of)
    %    the argument is not canonical;
    % - `canonicalize' will substitute a string indicating the presence
    %    of a non-canonical subterm;
    % - `include_details_cc' will show the structure of any non-canonical
    %   subterms, but can only be called from a committed choice context.
    %
:- pred string_ops_noncanon(noncanon_handling, ops.table, T, string).
:- mode string_ops_noncanon(in(do_not_allow), in, in, out) is det.
:- mode string_ops_noncanon(in(canonicalize), in, in, out) is det.
:- mode string_ops_noncanon(in(include_details_cc), in, in, out) is cc_multi.
:- mode string_ops_noncanon(in, in, in, out) is cc_multi.

%--------------------------------------------------%
%
% Converting values to strings based on a format string.
%

:- type poly_type
    --->    f(float)
    ;       i(int)
    ;       i8(int8)
    ;       i16(int16)
    ;       i32(int32)
    ;       i64(int64)
    ;       u(uint)
    ;       u8(uint8)
    ;       u16(uint16)
    ;       u32(uint32)
    ;       u64(uint64)
    ;       s(string)
    ;       c(char).

    % A function similar to sprintf() in C.
    %
    % For example,
    %   format("%s %i %c %f\n",
    %       [s("Square-root of"), i(2), c('='), f(1.41)], String)
    % will return
    %   String = "Square-root of 2 = 1.41\n".
    %
    % The following options available in C are supported: flags [0+-# ],
    % a field width (or *), and a precision (could be a ".*").
    %
    % Valid conversion character types are {dioxXucsfeEgGp%}. %n is not
    % supported. format will not return the length of the string.
    %
    % conv  var         output form.      effect of '#'.
    % char. type(s).
    %
    % d     int         signed integer
    % i     int         signed integer
    % o     int, uint   unsigned octal    with '0' prefix
    % x,X   int, uint   unsigned hex      with '0x', '0X' prefix
    % u     int, uint   unsigned integer
    % c     char        character
    % s     string      string
    % f     float       rational number   with '.', if precision 0
    % e,E   float       [-]m.dddddE+-xx   with '.', if precision 0
    % g,G   float       either e or f     with trailing zeros.
    % p     int, uint   integer
    %
    % The valid conversion characters for int8, int16, int32 and int64
    % are the same as for int, and the valid conversion characters for
    % uint8, uint16, uint32 and uint64 are the same as for uint.
    %
    % An option of zero will cause any padding to be zeros rather than spaces.
    % A '-' will cause the output to be left-justified in its 'space'.
    % (Without a `-', the default is for fields to be right-justified.)
    % A '+' forces a sign to be printed. This is not sensible for string
    % and character output. A ' ' causes a space to be printed before a thing
    % if there is no sign there. The other option is the '#', which modifies
    % the output string's format. These options are normally put directly
    % after the '%'.
    %
    % Notes:
    %
    % %#.0e, %#.0E now prints a '.' before the 'e'.
    %
    % Asking for more precision than a float actually has will result in
    % potentially misleading output.
    %
    % Numbers are now rounded by precision value, not truncated as previously.
    %
    % The implementation uses the sprintf() function in C grades,
    % so the actual output will depend on the C standard library.
    %
:- func format(string, list(poly_type)) = string.
:- pred format(string::in, list(poly_type)::in, string::out) is det.

%--------------------------------------------------%
%--------------------------------------------------%


Next: , Previous: string.builder, Up: Top   [Contents]