Next: table_statistics, Previous: string.builder, Up: Top [Contents]
%--------------------------------------------------%
% vim: ts=4 sw=4 et ft=mercury
%--------------------------------------------------%
% Copyright (C) 1993-2012 The University of Melbourne.
% Copyright (C) 2013-2025 The Mercury team.
% This file is distributed under the terms specified in COPYING.LIB.
%--------------------------------------------------%
%
% File: string.m.
% Main authors: fjh, petdr, wangp.
% Stability: high.
%
% This module provides basic string handling facilities.
%
% Mercury strings are Unicode strings. They use either the UTF-8 or UTF-16
% encoding, depending on the target language.
%
% When Mercury is compiled to C, strings are UTF-8 encoded, with a null
% character as the string terminator. With UTF-8, each code unit is one byte,
% and a single code point requires one to four of these code units to encode.
%
% When Mercury is compiled to Java, strings are represented using Java's
% String type. When Mercury is compiled to C#, strings are represented using
% C#'s `System.String' type. Both of these types use the UTF-16 encoding.
% With UTF-16, each code unit is a 16 bit integer, and a single code point
% requires one or two of these code units to encode.
%
% The Mercury compiler will only allow well-formed UTF-8 or UTF-16 string
% constants. However, it is possible to produce strings containing invalid
% UTF-8 or UTF-16 via I/O, foreign code, and substring operations.
% Predicates or functions that inspect strings may fail, throw an exception,
% or else behave in some other special way when they encounter an ill-formed
% code unit sequence.
%
% Unexpected null characters embedded in the middle of strings can be a source
% of security vulnerabilities, so the Mercury library predicates and functions
% which create strings from (lists of) characters throw an exception if they
% detect such a null character. Programmers must not create strings that might
% contain null characters using the foreign language interface.
%
% The builtin comparison operation on strings is also dependent on the target
% language. The current implementation performs string comparison using
%
% - C's strcmp() function, when compiling to C;
% - Java's String.compareTo() method, when compiling to Java; and
% - C#'s System.String.CompareOrdinal() method, when compiling to C#.
%
%--------------------------------------------------%
%
% This module is divided into several sections. These sections are:
%
% - Wrapper types that associate particular semantics with raw strings.
% - Identifying the Unicode encoding form used by the current platform.
% - Converting between strings and lists of characters.
% - Reading characters from strings.
% - Writing characters to strings.
% - Determining the lengths of strings.
% - Computing hashes of strings.
% - Tests on strings.
% - Appending strings.
% - Splitting up strings.
% - Dealing with prefixes and suffixes.
% - Transformations of strings.
% - Folds over the characters in strings.
% - Formatting tables.
% - Converting strings to docs.
% - Converting strings to values of builtin types.
% - Converting values of builtin types to strings.
% - Converting values of arbitrary types to strings.
% - Converting values to strings based on a format string.
%
%--------------------------------------------------%
:- module string.
:- interface.
:- include_module builder.
:- import_module assoc_list.
:- import_module char.
:- import_module deconstruct.
:- import_module list.
:- import_module maybe.
:- import_module ops.
:- import_module pretty_printer.
%--------------------------------------------------%
%
% Wrapper types that associate particular semantics with raw strings.
%
% These types are useful for defining stream typeclass instances
% where you want different instances for strings representing different
% semantic entities. Using the string type itself, without a wrapper,
% would be ambiguous in such situations.
%
% While each module that associates semantics with strings could define
% its own wrapper types, the notions of lines and text files are so common
% that it is simpler to define them just once, and this is the logical
% place to do that.
%
% A line is:
%
% - a possibly empty sequence of non-newline characters terminated by a
% newline character; or
% - a non-empty sequence of non-newline characters terminated by the end
% of the file.
%
:- type line
---> line(string).
% A text file is a possibly empty sequence of characters
% terminated by the end of the file.
%
:- type text_file
---> text_file(string).
%--------------------------------------------------%
:- type string_encoding
---> utf8
; utf16.
% Return the internal string encoding on the current platform.
%
:- func internal_string_encoding = string_encoding.
%--------------------------------------------------%
%
% Conversions between strings and lists of characters.
%
% Convert the string to a list of characters (code points).
%
% If strings use UTF-8 encoding, then each code unit in an ill-formed
% sequence is replaced by U+FFFD REPLACEMENT CHARACTER in the list.
% If strings use UTF-16 encoding, then each unpaired surrogate code point
% is returned as a separate code point in the list.
%
:- func to_char_list(string) = list(char).
:- pred to_char_list(string::in, list(char)::out) is det.
% Convert the string to a list of characters (code points) in reverse
% order.
%
% If strings use UTF-8 encoding, then each code unit in an ill-formed
% sequence is replaced by U+FFFD REPLACEMENT CHARACTER in the list.
% If strings use UTF-16 encoding, then each unpaired surrogate code point
% is returned as a separate code point in the list.
%
:- func to_rev_char_list(string) = list(char).
:- pred to_rev_char_list(string::in, list(char)::out) is det.
% Convert a list of characters (code points) to a string.
% Throws an exception if the list contains a null character or code point
% that cannot be encoded in a string. (Namely, surrogate code points cannot
% be encoded in UTF-8 strings.)
%
:- func from_char_list(list(char)::in) = (string::uo) is det.
:- pred from_char_list(list(char)::in, string::uo) is det.
% As above, but fail instead of throwing an exception if the list contains
% a null character or code point that cannot be encoded in a string.
%
:- pred semidet_from_char_list(list(char)::in, string::uo) is semidet.
% Same as from_char_list, except that it reverses the order
% of the characters.
% Throws an exception if the list contains a null character or code point
% that cannot be encoded in a string. (Namely, surrogate code points cannot
% be encoded in UTF-8 strings.)
%
:- func from_rev_char_list(list(char)::in) = (string::uo) is det.
:- pred from_rev_char_list(list(char)::in, string::uo) is det.
% As above, but fail instead of throwing an exception if the list contains
% a null character or code point that cannot be encoded in a string.
%
:- pred semidet_from_rev_char_list(list(char)::in, string::uo) is semidet.
% Convert a string into a list of code units of the string encoding used
% by the current process.
%
:- pred to_code_unit_list(string::in, list(int)::out) is det.
% Convert a string into a list of UTF-8 code units.
% Throws an exception if the string contains an unpaired surrogate code
% point, as the encoding of surrogate code points is prohibited in UTF-8.
%
:- pred to_utf8_code_unit_list(string::in, list(int)::out) is det.
% Convert a string into a list of UTF-16 code units.
% Throws an exception if strings use UTF-8 encoding and the given string
% contains an ill-formed code unit sequence, as arbitrary bytes cannot be
% represented in UTF-16 (even allowing for ill-formed sequences).
%
:- pred to_utf16_code_unit_list(string::in, list(int)::out) is det.
% Convert a list of code units to a string.
% Fails if the list does not contain a valid encoding of a string
% (in the encoding expected by the current process),
% or if the string would contain a null character.
%
:- pred from_code_unit_list(list(int)::in, string::uo) is semidet.
% Convert a list of code units to a string.
% The resulting string may contain ill-formed sequences.
% Fails if the list contains a code unit that is out of range
% or if the string would contain a null character.
%
:- pred from_code_unit_list_allow_ill_formed(list(int)::in, string::uo)
is semidet.
% Convert a list of UTF-8 code units to a string.
% Fails if the list does not contain a valid encoding of a string
% or if the string would contain a null character.
%
:- pred from_utf8_code_unit_list(list(int)::in, string::uo) is semidet.
% Convert a list of UTF-16 code units to a string.
% Fails if the list does not contain a valid encoding of a string
% or if the string would contain a null character.
%
:- pred from_utf16_code_unit_list(list(int)::in, string::uo) is semidet.
% duplicate_char(Char, Count, String):
%
% Construct a string consisting of Count occurrences of Char code points
% in sequence, returning the empty string if Count is less than or equal
% to zero. Throws an exception if Char is a null character or code point
% that cannot be encoded in a string. (Namely, surrogate code points cannot
% be encoded in UTF-8 strings.)
%
:- func duplicate_char(char::in, int::in) = (string::uo) is det.
:- pred duplicate_char(char::in, int::in, string::uo) is det.
%--------------------------------------------------%
%
% Reading characters from strings.
%
% This type is used by the _repl indexing predicates to distinguish a
% U+FFFD code point that is actually in a string from a U+FFFD code point
% generated when the predicate encounters an ill-formed code unit sequence
% in a UTF-8 string.
%
:- type maybe_replaced
---> not_replaced
; replaced_code_unit(uint8).
% index(String, Index, Char):
%
% If Index is the initial code unit offset of a well-formed code unit
% sequence in String then Char is the code point encoded by that
% sequence.
%
% Otherwise, if Index is in range, Char is either a U+FFFD REPLACEMENT
% CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
% code point at Index (when strings are UTF-16 encoded).
%
% Fails if Index is out of range (negative, or greater than or equal to
% the length of String).
%
:- pred index(string::in, int::in, char::uo) is semidet.
% det_index(String, Index, Char):
%
% Like index/3 but throws an exception if Index is out of range
% (negative, or greater than or equal to the length of String).
%
:- func det_index(string, int) = char.
:- pred det_index(string::in, int::in, char::uo) is det.
% unsafe_index(String, Index, Char):
%
% Like index/3 but does not check that Index is in range.
%
% WARNING: behavior is UNDEFINED if Index is out of range
% (negative, or greater than or equal to the length of String).
% This version is constant time, whereas det_index
% may be linear in the length of the string. Use with care!
%
:- func unsafe_index(string, int) = char.
:- pred unsafe_index(string::in, int::in, char::uo) is det.
% A synonym for det_index/2:
% String ^ elem(Index) = det_index(String, Index).
%
:- func string ^ elem(int) = char.
% A synonym for unsafe_index/2:
% String ^ unsafe_elem(Index) = unsafe_index(String, Index).
%
:- func string ^ unsafe_elem(int) = char.
% index_next(String, Index, NextIndex, Char):
%
% Succeeds if and only if Index is between 0 and Len-1 (both inclusive)
% where Len is the number of code units in String.
%
% If Index is the initial code unit offset of a well-formed code unit
% sequence in String, then Char will be set to the code point encoded
% by that sequence, and NextIndex will be set to the offset of the code
% unit immediately following that sequence.
%
% If Index is *not* the initial code unit offset of a well-formed
% code unit sequence, NextIndex will be set to Index + 1, but the value
% of Char will depend on string encoding used by the target platform.
%
% - On platforms that encode strings using UTF-8 (i.e. when targeting C)
% Char will be set to U+FFFD (the Unicode replacement character).
%
% - On platforms that encode strings using UTF-16 (i.e. when targeting
% C# or Java), Char will be set to the unpaired surrogate code point
% at Index. (For more details, see the comment just below.)
%
:- pred index_next(string::in, int::in, int::out, char::uo) is semidet.
% index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
%
% Does the same job as index_next/4 but on success, it also returns
% MaybeReplaced, which will specify whether Char is the result
% of the replacement of a non-well-formed UTF-8 character with U+FFFD.
%
% On platforms that encode strings using UTF-8 (i.e. when targeting C),
% there are three cases.
%
% - If Char is not U+FFFD, then MaybeReplaced will be `not_replaced'.
%
% - If Char is U+FFFD because there is a well-formed code point encoded
% in String starting at Index, and that code point is U+FFFD, then
% MaybeReplaced will also be `not_replaced'.
%
% - If Char is U+FFFD but there is *no* well formed code point encoded
% in String starting at Index, then MaybeReplaced will be
% `replaced_code_unit(CodeUnit)', where CodeUnit is the code unit
% at offset Index in String.
%
% On platforms that encode strings using UTF-16 (i.e. when targeting C#
% or Java), MaybeReplaced will always be bound to `not_replaced'.
% The only ways that a UTF-16 string may be non-well-formed are
%
% - by having a high surrogate code unit (between 0xD800 and 0xDBFF)
% that is not immediately followed by a low surrogate code unit
% (between 0xDC00 and 0xDFFF), or
%
% - by having a low surrogate code unit that is not immediately preceded
% by a high surrogate code unit.
%
% In both cases, index_next_repl will return the unpaired surrogate
% unchanged as Char. There is no replacement required, because
%
% - surrogate code units are all the range 0xD800 to 0xDFFF, and
% - the Unicode standard deliberately does not assign any characters
% to the code points in this range.
%
% This means that if Char is in this range, then it must be an unpaired
% surrogate, but since Char actually appears in String, it won't be
% a *replacement* of another character.
%
:- pred index_next_repl(string::in, int::in, int::out, char::uo,
maybe_replaced::out) is semidet.
% unsafe_index_next(String, Index, NextIndex, Char):
%
% Like index_next/4 but does not check that Index is in range.
% Fails if Index is equal to the length of String.
%
% WARNING: behavior is UNDEFINED if Index is out of range
% (negative, or greater than the length of String).
%
:- pred unsafe_index_next(string::in, int::in, int::out, char::uo) is semidet.
% unsafe_index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
%
% Like index_next_repl/5 but does not check that Index is in range.
% Fails if Index is equal to the length of String.
%
% WARNING: behavior is UNDEFINED if Index is out of range
% (negative, or greater than the length of String).
%
:- pred unsafe_index_next_repl(string::in, int::in, int::out, char::uo,
maybe_replaced::out) is semidet.
% prev_index(String, Index, PrevIndex, Char):
%
% If Index - 1 is the final code unit offset of a well-formed sequence in
% String then Char is the code point encoded by that sequence, and
% PrevIndex is the initial code unit offset of that sequence.
%
% Otherwise, if Index is in range, Char is either a U+FFFD REPLACEMENT
% CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
% code point at Index - 1 (when strings are UTF-16 encoded), and
% PrevIndex is Index - 1.
%
% Fails if Index is out of range (non-positive, or greater than the
% length of String).
%
:- pred prev_index(string::in, int::in, int::out, char::uo) is semidet.
% prev_index_repl(String, Index, PrevIndex, Char, MaybeReplaced):
%
% Like prev_index/4 but also returns MaybeReplaced on success.
% When Char is not U+FFFD, then MaybeReplaced is always `not_replaced'.
% When Char is U+FFFD (the Unicode replacement character), then there are
% two cases:
%
% - If there is a U+FFFD code point encoded in String at
% [PrevIndex, Index) then MaybeReplaced is `not_replaced'.
%
% - Otherwise, MaybeReplaced is `replaced_code_unit(CodeUnit)' where
% CodeUnit is the code unit in String at Index - 1.
%
:- pred prev_index_repl(string::in, int::in, int::out, char::uo,
maybe_replaced::out) is semidet.
% unsafe_prev_index(String, Index, PrevIndex, Char):
%
% Like prev_index/4 but does not check that Index is in range.
% Fails if Index is zero.
%
% WARNING: behavior is UNDEFINED if Index is out of range
% (negative, or greater than the length of String).
%
:- pred unsafe_prev_index(string::in, int::in, int::out, char::uo) is semidet.
% unsafe_prev_index_repl(String, Index, PrevIndex, Char, MaybeReplaced):
%
% Like prev_index_repl/5 but does not check that Index is in range.
% Fails if Index is zero.
%
% WARNING: behavior is UNDEFINED if Index is out of range
% (negative, or greater than the length of String).
%
:- pred unsafe_prev_index_repl(string::in, int::in, int::out, char::uo,
maybe_replaced::out) is semidet.
% unsafe_index_code_unit(String, Index, CodeUnit):
%
% CodeUnit is the code unit in String at the offset Index.
% WARNING: behavior is UNDEFINED if Index is out of range
% (negative, or greater than or equal to the length of String).
%
:- pred unsafe_index_code_unit(string::in, int::in, int::out) is det.
%--------------------------------------------------%
%
% Writing characters to strings.
%
% set_char(Char, Index, String0, String):
%
% String is String0, with the code unit sequence beginning at Index
% replaced by the encoding of Char. If the code unit at Index is the
% initial code unit in a valid encoding of a code point, then that entire
% code unit sequence is replaced. Otherwise, only the code unit at Index
% is replaced.
%
% Fails if Index is out of range (negative, or greater than or equal to
% the length of String0).
%
% Throws an exception if Char is the null character or a code point that
% cannot be encoded in a string (namely, surrogate code points cannot be
% encoded in UTF-8 strings).
%
:- pred set_char(char, int, string, string).
:- mode set_char(in, in, in, out) is semidet.
% NOTE This mode is disabled because the compiler puts constant strings
% into static data even when they might be updated.
% :- mode set_char(in, in, di, uo) is semidet.
% det_set_char(Char, Index, String0, String):
%
% Same as set_char/4 but throws an exception if Index is out of range
% (negative, or greater than or equal to the length of String0).
%
:- func det_set_char(char, int, string) = string.
:- pred det_set_char(char, int, string, string).
:- mode det_set_char(in, in, in, out) is det.
% NOTE This mode is disabled because the compiler puts constant strings
% into static data even when they might be updated.
% :- mode det_set_char(in, in, di, uo) is det.
% unsafe_set_char(Char, Index, String0, String):
%
% Same as set_char/4 but does not check if Index is in range.
% WARNING: behavior is UNDEFINED if Index is out of range
% (negative, or greater than or equal to the length of String0).
% Use with care!
%
:- func unsafe_set_char(char, int, string) = string.
:- mode unsafe_set_char(in, in, in) = out is det.
% NOTE This mode is disabled because the compiler puts constant strings
% into static data even when they might be updated.
% :- mode unsafe_set_char(in, in, di) = uo is det.
:- pred unsafe_set_char(char, int, string, string).
:- mode unsafe_set_char(in, in, in, out) is det.
% NOTE This mode is disabled because the compiler puts constant strings
% into static data even when they might be updated.
% :- mode unsafe_set_char(in, in, di, uo) is det.
%--------------------------------------------------%
%
% Determining the lengths of strings.
%
% Determine the length of a string, in code units.
% An empty string has length zero.
%
% NOTE: code points (characters) are encoded using one or more code units,
% i.e. bytes for UTF-8; 16-bit integers for UTF-16.
%
:- func length(string::in) = (int::uo) is det.
:- pred length(string, int).
:- mode length(in, uo) is det.
:- mode length(ui, uo) is det.
% Synonyms for length.
%
:- func count_code_units(string) = int.
:- pred count_code_units(string::in, int::out) is det.
% Determine the number of code points in a string.
%
% Each valid code point, and each code unit that is part of an ill-formed
% sequence, contributes one to the result.
% (This matches the number of steps it would take to iterate over the
% string using string.index_next or string.prev_index.)
%
:- func count_code_points(string) = int.
:- pred count_code_points(string::in, int::out) is det.
% NOTE We are changing all occurrences of "codepoint" in the
% names of predicates and functions to "code_point", for consistency
% with predicate and function names that talk about code_units.
%
:- func count_codepoints(string) = int.
:- pred count_codepoints(string::in, int::out) is det.
:- pragma obsolete(func(count_codepoints/1), [count_code_points/1]).
:- pragma obsolete(pred(count_codepoints/2), [count_code_points/2]).
% count_utf8_code_units(String) = Length:
%
% Return the number of code units required to represent a string in
% UTF-8 encoding (with allowance for ill-formed sequences).
% Equivalent to Length = length(to_utf8_code_unit_list(String)).
%
% Throws an exception if strings use UTF-16 encoding but the given string
% contains an unpaired surrogate code point. Surrogate code points cannot
% be represented in UTF-8.
%
:- func count_utf8_code_units(string) = int.
% code_point_offset(String, StartOffset, Count, Offset):
%
% Let S be the substring of String from code unit StartOffset to the
% end of the string. Offset is code unit offset after advancing Count
% steps in S, where each step skips over either:
% - one encoding of a Unicode code point, or
% - one code unit that is part of an ill-formed sequence.
%
% Fails if StartOffset is out of range (negative, or greater than the
% length of String), or if there are fewer than Count steps possible in S.
%
:- pred code_point_offset(string::in, int::in, int::in, int::out) is semidet.
% NOTE We are changing all occurrences of "codepoint" in the
% names of predicates and functions to "code_point", for consistency
% with predicate and function names that talk about code_units.
%
:- pred codepoint_offset(string::in, int::in, int::in, int::out) is semidet.
:- pragma obsolete(pred(codepoint_offset/4), [code_point_offset/4]).
% code_point_offset(String, Count, Offset):
%
% Same as `code_point_offset(String, 0, Count, Offset)'.
%
:- pred code_point_offset(string::in, int::in, int::out) is semidet.
% NOTE We are changing all occurrences of "codepoint" in the
% names of predicates and functions to "code_point", for consistency
% with predicate and function names that talk about code_units.
%
:- pred codepoint_offset(string::in, int::in, int::out) is semidet.
:- pragma obsolete(pred(codepoint_offset/3), [code_point_offset/3]).
%--------------------------------------------------%
%
% Computing hashes of strings.
%
% Compute a hash value for a string.
%
:- func hash(string) = int.
:- pred hash(string::in, int::out) is det.
% Two other hash functions for strings.
%
:- func hash2(string) = int.
:- func hash3(string) = int.
% Cross-compilation-friendly versions of hash, hash2 and hash3
% respectively.
:- func hash4(string) = int.
:- func hash5(string) = int.
:- func hash6(string) = int.
%--------------------------------------------------%
%
% Tests on strings.
%
% True if string is the empty string.
%
:- pred is_empty(string::in) is semidet.
% True if the string is a valid UTF-8 or UTF-16 string.
% In target languages that use UTF-8 string encoding, `is_well_formed(S)'
% is true if-and-only-if S consists of a well-formed UTF-8 code unit
% sequence.
% In target languages that use UTF-16 string encoding, `is_well_formed(S)'
% is true if-and-only-if S consists of a well-formed UTF-16 code unit
% sequence.
%
:- pred is_well_formed(string::in) is semidet.
% Values of this type record whether a string is well or ill formed.
% In the latter case, the integer gives the offset in the string
% (as a count of either UTF-8 or UTF-16 code units, depending on the
% target language) of the first position at which the string departs
% from well-formedness.
%
:- type well_or_ill_formed
---> well_formed
; ill_formed(int).
% Does the same job as is_well_formed, but if the string is NOT well
% formed, it will return the offset (as a count of code units) of the
% first position at which the string departs from well-formedness.
%
:- pred check_well_formedness(string::in, well_or_ill_formed::out) is det.
% True if string contains only alphabetic characters [A-Za-z].
%
:- pred is_all_alpha(string::in) is semidet.
% True if string contains only alphabetic characters [A-Za-z] and digits
% [0-9].
%
:- pred is_all_alnum(string::in) is semidet.
% True if string contains only alphabetic characters [A-Za-z] and
% underscores.
%
:- pred is_all_alpha_or_underscore(string::in) is semidet.
% True if string contains only alphabetic characters [A-Za-z],
% digits [0-9], and underscores.
%
:- pred is_all_alnum_or_underscore(string::in) is semidet.
% True if the string contains only decimal digits (0-9).
%
:- pred is_all_digits(string::in) is semidet.
% all_match(TestPred, String):
%
% True if-and-only-if all code points in String satisfy TestPred,
% and String contains no ill-formed code unit sequences.
%
:- pred all_match(pred(char)::in(pred(in) is semidet), string::in) is semidet.
% contains_match(TestPred, String):
%
% True if-and-only-if String contains at least one code point
% that satisfies TestPred. Any ill-formed code unit sequences in String
% are ignored as they do not encode code points.
%
:- pred contains_match(pred(char)::in(pred(in) is semidet), string::in)
is semidet.
% contains_char(String, Char):
%
% Succeed if the code point Char occurs in String.
% Any ill-formed code unit sequences within String are ignored
% as they will not contain Char.
%
:- pred contains_char(string::in, char::in) is semidet.
% compare_substrings(Res, X, StartX, Y, StartY, Length):
%
% Compare two substrings by code unit order. The two substrings are
% the substring of X between StartX and StartX + Length, and
% the substring of Y between StartY and StartY + Length.
% StartX, StartY and Length are all in terms of code units.
%
% Fails if StartX or StartX + Length are not within [0, length(X)],
% or if StartY or StartY + Length are not within [0, length(Y)],
% or if Length is negative.
%
:- pred compare_substrings(comparison_result::uo, string::in, int::in,
string::in, int::in, int::in) is semidet.
% unsafe_compare_substrings(Res, X, StartX, Y, StartY, Length):
%
% Same as compare_between/4 but without range checks.
% WARNING: if any of StartX, StartY, StartX + Length or
% StartY + Length are out of range, or if Length is negative,
% then the behaviour is UNDEFINED. Use with care!
%
:- pred unsafe_compare_substrings(comparison_result::uo, string::in, int::in,
string::in, int::in, int::in) is det.
% compare_ignore_case_ascii(Res, X, Y):
%
% Compare two strings by code unit order, ignoring the case of letters
% (A-Z, a-z) in the ASCII range.
% Equivalent to `compare(Res, to_lower(X), to_lower(Y))'
% but more efficient.
%
:- pred compare_ignore_case_ascii(comparison_result::uo,
string::in, string::in) is det.
% prefix_length(Pred, String):
%
% The length (in code units) of the maximal prefix of String consisting
% entirely of code points satisfying Pred.
%
:- func prefix_length(pred(char)::in(pred(in) is semidet), string::in)
= (int::out) is det.
% suffix_length(Pred, String):
%
% The length (in code units) of the maximal suffix of String consisting
% entirely of code points satisfying Pred.
%
:- func suffix_length(pred(char)::in(pred(in) is semidet), string::in)
= (int::out) is det.
% sub_string_search(String, SubString, Index):
%
% Index is the code unit position in String where the first
% occurrence of SubString begins. Indices start at zero, so if
% SubString is a prefix of String, this will return Index = 0.
%
:- pred sub_string_search(string::in, string::in, int::out) is semidet.
% sub_string_search_start(String, SubString, BeginAt, Index):
%
% Index is the code unit position in String where the first
% occurrence of SubString occurs such that 'Index' is greater than or
% equal to BeginAt. Indices start at zero.
% Fails if either BeginAt is negative, or greater than
% length(String) - length(SubString).
%
:- pred sub_string_search_start(string::in, string::in, int::in, int::out)
is semidet.
% unsafe_sub_string_search_start(String, SubString, BeginAt, Index):
%
% Same as sub_string_search_start/4 but does not check that BeginAt
% is in range.
% WARNING: if BeginAt is either negative, or greater than length(String),
% then the behaviour is UNDEFINED. Use with care!
%
:- pred unsafe_sub_string_search_start(string::in, string::in, int::in,
int::out) is semidet.
% find_first_char(String, Char, Index):
%
% Find the first occurrence of the code point Char in String.
% On success, Index is the code unit offset of that code point.
%
:- pred find_first_char(string::in, char::in, int::out) is semidet.
% find_first_char_start(String, Char, BeginAt, Index):
%
% Find the first occurrence of the code point Char in String,
% beginning from the code unit offset BeginAt in String.
% On success, Index is the code unit offset of that code point.
%
% Fails if BeginAt is out of range (negative, or greater than or equal
% to the length of String).
%
:- pred find_first_char_start(string::in, char::in, int::in, int::out)
is semidet.
% unsafe_find_first_char_start(String, Char, BeginAt, Index):
%
% Same as find_first_char_start/4 but does not check that BeginAt
% is in range.
% WARNING: if BeginAt is either negative, or greater than length(String),
% then the behaviour is UNDEFINED. Use with care!
%
:- pred unsafe_find_first_char_start(string::in, char::in, int::in, int::out)
is semidet.
% find_last_char(String, Char, Index):
%
% Find the last occurrence of the code point Char in String.
% On success, Index is the code unit offset of that code point.
%
:- pred find_last_char(string::in, char::in, int::out) is semidet.
%--------------------------------------------------%
%
% Appending strings.
%
% Append two strings together.
%
:- func append(string::in, string::in) = (string::uo) is det.
% append(S1, S2, S3):
%
% Append two strings together. S3 consists of the code units of S1
% followed by the code units of S2, in order.
%
% An ill-formed code unit sequence at the end of S1 may join with an
% ill-formed code unit sequence at the start of S2 to produce a valid
% encoding of a code point in S3.
%
:- pred append(string, string, string).
:- mode append(in, in, in) is semidet. % implied
:- mode append(in, uo, in) is semidet.
:- mode append(in, in, uo) is det.
:- mode append(uo, in, in) is semidet.
% nondet_append(S1, S2, S3):
%
% Non-deterministically return S1 and S2, where S1 ++ S2 = S3.
% S3 is split after each code point or code unit in an ill-formed sequence.
%
:- pred nondet_append(string::out, string::out, string::in) is multi.
% S1 ++ S2 = S :- append(S1, S2, S).
%
% Append two strings together using nicer inline syntax.
%
:- func string ++ string = string.
:- mode in ++ in = uo is det.
% Append a list of strings together.
%
:- func append_list(list(string)::in) = (string::uo) is det.
:- pred append_list(list(string)::in, string::uo) is det.
% join_list(Separator, Strings) = JoinedString:
%
% Append together the strings in Strings, putting Separator between
% each pair of adjacent strings. If Strings is the empty list,
% return the empty string.
%
:- func join_list(string::in, list(string)::in) = (string::uo) is det.
%--------------------------------------------------%
%
% Making strings from smaller pieces.
%
:- type string_piece
---> string(string)
; substring(string, int, int). % string, start, end offset
% append_string_pieces(Pieces, String):
%
% Append together the strings and substrings in Pieces into a string.
% Throws an exception if Pieces contains an element
% `substring(S, Start, End)' where Start or End are not within
% the range [0, length(S)], or if Start > End.
%
:- pred append_string_pieces(list(string_piece)::in, string::uo) is det.
% Same as append_string_pieces/2 but without range checks.
% WARNING: if any piece `substring(S, Start, End)' has Start or End
% outside the range [0, length(S)], or if Start > End,
% then the behaviour is UNDEFINED. Use with care!
%
:- pred unsafe_append_string_pieces(list(string_piece)::in, string::uo)
is det.
%--------------------------------------------------%
%
% Splitting up strings.
%
% first_char(String, Char, Rest) is true if-and-only-if String begins
% with a well-formed code unit sequence, Char is the code point encoded by
% that sequence, and Rest is the rest of String after that sequence.
%
% The (uo, in, in) mode throws an exception if Char cannot be encoded in
% a string, or if Char is a surrogate code point (for consistency with
% the other modes).
%
% WARNING: first_char makes a copy of Rest because the garbage collector
% doesn't handle references into the middle of an object, at least not the
% way we use it. This means that repeated use of first_char to iterate
% over a string will result in very poor performance. If you want to
% iterate over the characters in a string, use foldl or to_char_list
% instead.
%
:- pred first_char(string, char, string).
:- mode first_char(in, in, in) is semidet. % implied
:- mode first_char(in, uo, in) is semidet. % implied
:- mode first_char(in, in, uo) is semidet. % implied
:- mode first_char(in, uo, uo) is semidet.
:- mode first_char(uo, in, in) is det.
% split(String, Index, LeftSubstring, RightSubstring):
%
% Split a string into two substrings at the code unit offset Index.
% (If Index is out of the range [0, length of String], it is treated
% as if it were the nearest end-point of that range.)
%
:- pred split(string::in, int::in, string::out, string::out) is det.
% split_by_code_point(String, Count, LeftSubstring, RightSubstring):
%
% LeftSubstring is the left-most Count code points of String,
% and RightSubstring is the remainder of String.
% (If Count is out of the range [0, length of String], it is treated
% as if it were the nearest end-point of that range.)
%
:- pred split_by_code_point(string::in, int::in, string::out, string::out)
is det.
% NOTE We are changing all occurrences of "codepoint" in the
% names of predicates and functions to "code_point", for consistency
% with predicate and function names that talk about code_units.
%
:- pred split_by_codepoint(string::in, int::in, string::out, string::out)
is det.
:- pragma obsolete(pred(split_by_codepoint/4), [split_by_code_point/4]).
% left(String, Count, LeftSubstring):
%
% LeftSubstring is the left-most Count code units of String.
% (If Count is out of the range [0, length of String], it is treated
% as if it were the nearest end-point of that range.)
%
:- func left(string::in, int::in) = (string::out) is det.
:- pred left(string::in, int::in, string::out) is det.
% left_by_code_point(String, Count, LeftSubstring):
%
% LeftSubstring is the left-most Count code points of String.
% (If Count is out of the range [0, length of String], it is treated
% as if it were the nearest end-point of that range.)
%
:- func left_by_code_point(string::in, int::in) = (string::out) is det.
:- pred left_by_code_point(string::in, int::in, string::out) is det.
% NOTE We are changing all occurrences of "codepoint" in the
% names of predicates and functions to "code_point", for consistency
% with predicate and function names that talk about code_units.
%
:- func left_by_codepoint(string::in, int::in) = (string::out) is det.
:- pred left_by_codepoint(string::in, int::in, string::out) is det.
:- pragma obsolete(func(left_by_codepoint/2), [left_by_codepoint/2]).
:- pragma obsolete(pred(left_by_codepoint/3), [left_by_codepoint/3]).
% right(String, Count, RightSubstring):
%
% RightSubstring is the right-most Count code units of String.
% (If Count is out of the range [0, length of String], it is treated
% as if it were the nearest end-point of that range.)
%
:- func right(string::in, int::in) = (string::out) is det.
:- pred right(string::in, int::in, string::out) is det.
% right_by_code_point(String, Count, RightSubstring):
%
% RightSubstring is the right-most Count code points of String.
% (If Count is out of the range [0, length of String], it is treated
% as if it were the nearest end-point of that range.)
%
:- func right_by_code_point(string::in, int::in) = (string::out) is det.
:- pred right_by_code_point(string::in, int::in, string::out) is det.
% NOTE We are changing all occurrences of "codepoint" in the
% names of predicates and functions to "code_point", for consistency
% with predicate and function names that talk about code_units.
%
:- func right_by_codepoint(string::in, int::in) = (string::out) is det.
:- pred right_by_codepoint(string::in, int::in, string::out) is det.
:- pragma obsolete(func(right_by_codepoint/2), [right_by_codepoint/2]).
:- pragma obsolete(pred(right_by_codepoint/3), [right_by_codepoint/3]).
% between(String, Start, End, Substring):
%
% Substring consists of the segment of String within the half-open
% interval [Start, End), where Start and End are code unit offsets.
% (If Start is out of the range [0, length of String], it is treated
% as if it were the nearest end-point of that range.
% If End is out of the range [Start, length of String],
% it is treated as if it were the nearest end-point of that range.)
%
:- func between(string::in, int::in, int::in) = (string::uo) is det.
:- pred between(string::in, int::in, int::in, string::uo) is det.
% between_code_points(String, Start, End, Substring):
%
% Substring is the part of String between the code point positions
% Start and End. The result is equivalent to:
%
% between(String, StartOffset, EndOffset, Substring)
%
% where:
%
% StartOffset is from code_point_offset(String, Start, StartOffset)
% if Start is in [0, count_code_points(String)],
% StartOffset = 0 if Start < 0,
% StartOffset = length(String) otherwise;
%
% EndOffset is from code_point_offset(String, End, EndOffset)
% if End is in [0, count_code_points(String)],
% EndOffset = 0 if End < 0,
% EndOffset = length(String) otherwise.
%
% between/4 will enforce StartOffset =< EndOffset.
%
:- func between_code_points(string::in, int::in, int::in)
= (string::uo) is det.
:- pred between_code_points(string::in, int::in, int::in, string::uo) is det.
% NOTE We are changing all occurrences of "codepoint" in the
% names of predicates and functions to "code_point", for consistency
% with predicate and function names that talk about code_units.
%
:- func between_codepoints(string::in, int::in, int::in)
= (string::uo) is det.
:- pred between_codepoints(string::in, int::in, int::in, string::uo) is det.
:- pragma obsolete(func(between_codepoints/3), [between_code_points/3]).
:- pragma obsolete(pred(between_codepoints/4), [between_code_points/4]).
% unsafe_between(String, Start, End, Substring):
%
% Substring consists of the segment of String within the half-open
% interval [Start, End), where Start and End are code unit offsets.
% WARNING: if Start is out of the range [0, length of String] or
% End is out of the range [Start, length of String]
% then the behaviour is UNDEFINED. Use with care!
% This version takes time proportional to the length of the substring,
% whereas substring may take time proportional to the length
% of the whole string.
%
:- func unsafe_between(string::in, int::in, int::in) = (string::uo) is det.
:- pred unsafe_between(string::in, int::in, int::in, string::uo) is det.
% words_separator(SepP, String) returns the list of non-empty
% substrings of String (in first to last order) that are delimited
% by non-empty sequences of code points matched by SepP.
% For example,
%
% words_separator(char.is_whitespace, " the cat sat on the mat") =
% ["the", "cat", "sat", "on", "the", "mat"]
%
% Note the difference to split_at_separator.
%
:- func words_separator(pred(char), string) = list(string).
:- mode words_separator(in(pred(in) is semidet), in) = out is det.
% words(String) =
% words_separator(char.is_whitespace, String).
%
:- func words(string) = list(string).
% split_at_separator(SepP, String) returns the list of (possibly empty)
% substrings of String (in first to last order) that are delimited
% by code points matched by SepP. For example,
%
% split_at_separator(char.is_whitespace, " the cat sat on the mat")
% = ["", "the", "cat", "", "sat", "on", "the", "", "mat"]
%
% Note the difference to words_separator.
%
:- func split_at_separator(pred(char), string) = list(string).
:- mode split_at_separator(in(pred(in) is semidet), in) = out is det.
% split_at_char(Char, String) =
% split_at_separator(unify(Char), String)
%
:- func split_at_char(char, string) = list(string).
% split_at_string(Separator, String) returns the list of substrings
% of String that are delimited by Separator. For example,
%
% split_at_string("|||", "|||fld2|||fld3") = ["", "fld2", "fld3"]
%
% Always the first match of Separator is used to break the String, for
% example: split_at_string("aa", "xaaayaaaz") = ["x", "ay", "az"]
%
:- func split_at_string(string, string) = list(string).
% split_into_lines(String) breaks String into a sequence of lines,
% with each line consisting of a possibly empty sequence of non-newline
% characters, followed either by a newline character, or by the end
% of the string. The string returned for a line will not contain
% the newline character.
%
:- func split_into_lines(string) = list(string).
%--------------------------------------------------%
%
% Dealing with prefixes and suffixes.
%
% prefix(String, Prefix) is true if-and-only-if
% Prefix is a prefix of String.
% Same as append(Prefix, _, String).
%
:- pred prefix(string::in, string::in) is semidet.
% suffix(String, Suffix) is true if-and-only-if
% Suffix is a suffix of String.
% Same as append(_, Suffix, String).
%
:- pred suffix(string::in, string::in) is semidet.
% remove_prefix(Prefix, String, Suffix):
%
% This is a synonym for append(Prefix, Suffix, String) but with the
% arguments in a more convenient order for use with higher-order code.
%
% WARNING: the argument order differs from remove_suffix.
%
:- pred remove_prefix(string::in, string::in, string::out) is semidet.
% det_remove_prefix(Prefix, String, Suffix):
%
% This is a synonym for append(Prefix, Suffix, String) but with the
% arguments in a more convenient order for use with higher-order code.
%
% WARNING: the argument order differs from remove_suffix.
%
:- pred det_remove_prefix(string::in, string::in, string::out) is det.
% remove_prefix_if_present(Prefix, String) = Suffix returns String minus
% Prefix if String begins with Prefix, and String if it doesn't.
%
:- func remove_prefix_if_present(string, string) = string.
% add_prefix(Prefix, Str) = PrefixStr:
%
% Does the same job as Prefix ++ Str = PrefixStr, but allows
% using list.map to add the same prefix to many strings.
%
:- func add_prefix(string, string) = string.
% remove_suffix(String, Suffix, Prefix):
%
% The same as append(Prefix, Suffix, String).
%
% WARNING: the argument order differs from both remove_prefix and
% remove_suffix_if_present.
%
:- pred remove_suffix(string::in, string::in, string::out) is semidet.
% det_remove_suffix(String, Suffix) returns the same value as
% remove_suffix, except it throws an exception if String does not end
% with Suffix.
%
% WARNING: the argument order differs from both remove_prefix and
% remove_suffix_if_present.
%
:- func det_remove_suffix(string, string) = string.
% remove_suffix_if_present(Suffix, String) returns String minus Suffix
% if String ends with Suffix, and String if it doesn't.
%
% WARNING: the argument order differs from remove_suffix and
% det_remove_suffix.
%
:- func remove_suffix_if_present(string, string) = string.
% add_suffix(Suffix, Str) = StrSuffix:
%
% Does the same job as Str ++ Suffix = StrSuffix, but allows
% using list.map to add the same suffix to many strings.
%
:- func add_suffix(string, string) = string.
%--------------------------------------------------%
%
% Transformations of strings.
%
% Convert the first character (if any) of a string to uppercase.
% Only letters (a-z) in the ASCII range are converted.
%
% This function transforms the initial code point of a string,
% whether or not the code point occurs as part of a combining sequence.
%
:- func capitalize_first(string) = string.
:- pred capitalize_first(string::in, string::out) is det.
% Convert the first character (if any) of a string to lowercase.
% Only letters (A-Z) in the ASCII range are converted.
%
% This function transforms the initial code point of a string,
% whether or not the code point occurs as part of a combining sequence.
%
:- func uncapitalize_first(string) = string.
:- pred uncapitalize_first(string::in, string::out) is det.
% Converts a string to uppercase.
% Only letters (A-Z) in the ASCII range are converted.
%
% This function transforms each code point individually.
% Letters that occur within a combining sequence will be converted,
% whereas the precomposed character equivalent to the combining
% sequence would not be converted. For example:
%
% to_upper("a\u0301") ==> "A\u0301" % á decomposed
% to_upper("\u00E1") ==> "\u00E1" % á precomposed
%
:- func to_upper(string::in) = (string::uo) is det.
:- pred to_upper(string, string).
:- mode to_upper(in, uo) is det.
:- mode to_upper(in, in) is semidet. % implied
% Converts a string to lowercase.
% Only letters (a-z) in the ASCII range are converted.
%
% This function transforms each code point individually.
% Letters that occur within a combining sequence will be converted,
% whereas the precomposed character equivalent to the combining
% sequence would not be converted. For example:
%
% to_lower("A\u0301") ==> "a\u0301" % Á decomposed
% to_lower("\u00C1") ==> "\u00C1" % Á precomposed
%
:- func to_lower(string::in) = (string::uo) is det.
:- pred to_lower(string, string).
:- mode to_lower(in, uo) is det.
:- mode to_lower(in, in) is semidet. % implied
% pad_left(String0, PadChar, Width, String):
%
% Insert PadChars at the left of String0 until it is at least as long
% as Width, giving String. Width is currently measured as the number
% of code points.
%
:- func pad_left(string, char, int) = string.
:- pred pad_left(string::in, char::in, int::in, string::out) is det.
% pad_right(String0, PadChar, Width, String):
%
% Insert PadChars at the right of String0 until it is at least as long
% as Width, giving String. Width is currently measured as the number
% of code points.
%
:- func pad_right(string, char, int) = string.
:- pred pad_right(string::in, char::in, int::in, string::out) is det.
% chomp(String):
%
% Return String minus any single trailing newline character.
%
:- func chomp(string) = string.
% strip(String):
%
% Returns String minus any initial and trailing ASCII whitespace
% characters, i.e. characters satisfying char.is_whitespace.
%
:- func strip(string) = string.
% lstrip(String):
%
% Return String minus any initial ASCII whitespace characters,
% i.e. characters satisfying char.is_whitespace.
%
:- func lstrip(string) = string.
% rstrip(String):
%
% Returns String minus any trailing ASCII whitespace characters,
% i.e. characters satisfying char.is_whitespace.
%
:- func rstrip(string) = string.
% lstrip_pred(Pred, String):
%
% Returns String minus the maximal prefix consisting entirely
% of code points satisfying Pred.
%
:- func lstrip_pred(pred(char)::in(pred(in) is semidet), string::in)
= (string::out) is det.
% rstrip_pred(Pred, String):
%
% Returns String minus the maximal suffix consisting entirely
% of code points satisfying Pred.
%
:- func rstrip_pred(pred(char)::in(pred(in) is semidet), string::in)
= (string::out) is det.
% replace(String0, Pattern, Subst, String):
%
% Replaces the first occurrence of Pattern in String0 with Subst to give
% String. Fails if Pattern does not occur in String0.
%
:- pred replace(string::in, string::in, string::in, string::uo) is semidet.
% replace_all(String0, Pattern, Subst, String):
%
% Replaces any occurrences of Pattern in String0 with Subst to give
% String.
%
% If Pattern is the empty string then Subst is inserted at every point
% in String0 except between two code units in an encoding of a code point.
% For example, these are true:
%
% replace_all("", "", "|", "|")
% replace_all("a", "", "|", "|a|")
% replace_all("ab", "", "|", "|a|b|")
%
:- func replace_all(string::in, string::in, string::in) = (string::uo) is det.
:- pred replace_all(string::in, string::in, string::in, string::uo) is det.
% replace_all_sv(Pattern, Subst, String0, String):
%
% Does the exact same job as replace_all, but takes the arguments
% in a different order. The advantage is that this order is easier to use
% with higher order code such as
%
% list.map(replace_all_sv(Pattern, Subst), Strings0, Strings)
%
% and with state variables, in code such as
%
% list.map(replace_all_sv(Pattern, Subst), !Strings)
%
:- pred replace_all_sv(string::in, string::in, string::in, string::uo) is det.
% word_wrap(Str, LineLen) = Wrapped:
%
% Wrapped is Str with newlines inserted between words (separated by ASCII
% space characters) so that at most LineLen code points appear on any line,
% and each line contains as many whole words as possible subject to that
% constraint. If any one word exceeds LineLen code points in length, then
% it will be broken over two (or more) lines. Sequences of whitespace
% characters are replaced by a single space.
%
% See char.is_whitespace for the definition of whitespace characters
% used by this predicate.
%
:- func word_wrap(string, int) = string.
% word_wrap_separator(Str, LineLen, BrokenWordSeparator) = Wrapped:
%
% word_wrap_separator/3 is like word_wrap/2, except that words that
% need to be broken up over multiple lines have BrokenWordSeparator
% inserted between each pair of pieces. If the number of code points in
% BrokenWordSeparator is greater than or equal to LineLen, then this
% function ignores the separator, since it would leave no room on a line
% for any actual words.
%
:- func word_wrap_separator(string, int, string) = string.
%--------------------------------------------------%
%
% Folds over the characters in strings.
%
% foldl(Pred, String, !Acc):
%
% Pred is an accumulator predicate which is to be called for each
% code point of the string String in turn.
% If String contains ill-formed sequences, Pred is called for each
% code unit in an ill-formed sequence. If strings use UTF-8 encoding,
% U+FFFD is passed to Pred in place of each such code unit.
% If strings use UTF-16 encoding, each code unit in an ill-formed sequence
% is an unpaired surrogate code point, which will be passed to Pred.
%
% The initial value of the accumulator is !.Acc and the final value is
% !:Acc.
% (foldl(Pred, String, !Acc) is equivalent to
% to_char_list(String, Chars),
% list.foldl(Pred, Chars, !Acc)
% but is implemented more efficiently.)
%
:- func foldl(func(char, A) = A, string, A) = A.
:- pred foldl(pred(char, A, A), string, A, A).
:- mode foldl(in(pred(in, di, uo) is det), in, di, uo) is det.
:- mode foldl(in(pred(in, in, out) is det), in, in, out) is det.
:- mode foldl(in(pred(in, in, out) is semidet), in, in, out) is semidet.
:- mode foldl(in(pred(in, in, out) is nondet), in, in, out) is nondet.
:- mode foldl(in(pred(in, in, out) is multi), in, in, out) is multi.
% foldl2(Pred, String, !Acc1, !Acc2):
% A variant of foldl with two accumulators.
%
:- pred foldl2(pred(char, A, A, B, B), string, A, A, B, B).
:- mode foldl2(in(pred(in, di, uo, di, uo) is det),
in, di, uo, di, uo) is det.
:- mode foldl2(in(pred(in, in, out, di, uo) is det),
in, in, out, di, uo) is det.
:- mode foldl2(in(pred(in, in, out, in, out) is det),
in, in, out, in, out) is det.
:- mode foldl2(in(pred(in, in, out, in, out) is semidet),
in, in, out, in, out) is semidet.
:- mode foldl2(in(pred(in, in, out, in, out) is nondet),
in, in, out, in, out) is nondet.
:- mode foldl2(in(pred(in, in, out, in, out) is multi),
in, in, out, in, out) is multi.
% foldl_between(Pred, String, Start, End, !Acc)
% is equivalent to foldl(Pred, SubString, !Acc)
% where SubString = between(String, Start, End).
%
% Start and End are in terms of code units.
%
:- func foldl_between(func(char, A) = A, string, int, int, A) = A.
:- pred foldl_between(pred(char, A, A), string, int, int, A, A).
:- mode foldl_between(in(pred(in, in, out) is det), in, in, in,
in, out) is det.
:- mode foldl_between(in(pred(in, di, uo) is det), in, in, in,
di, uo) is det.
:- mode foldl_between(in(pred(in, in, out) is semidet), in, in, in,
in, out) is semidet.
:- mode foldl_between(in(pred(in, in, out) is nondet), in, in, in,
in, out) is nondet.
:- mode foldl_between(in(pred(in, in, out) is multi), in, in, in,
in, out) is multi.
% foldl2_between(Pred, String, Start, End, !Acc1, !Acc2)
% A variant of foldl_between with two accumulators.
%
% Start and End are in terms of code units.
%
:- pred foldl2_between(pred(char, A, A, B, B),
string, int, int, A, A, B, B).
:- mode foldl2_between(in(pred(in, di, uo, di, uo) is det),
in, in, in, di, uo, di, uo) is det.
:- mode foldl2_between(in(pred(in, in, out, di, uo) is det),
in, in, in, in, out, di, uo) is det.
:- mode foldl2_between(in(pred(in, in, out, in, out) is det),
in, in, in, in, out, in, out) is det.
:- mode foldl2_between(in(pred(in, in, out, in, out) is semidet),
in, in, in, in, out, in, out) is semidet.
:- mode foldl2_between(in(pred(in, in, out, in, out) is nondet),
in, in, in, in, out, in, out) is nondet.
:- mode foldl2_between(in(pred(in, in, out, in, out) is multi),
in, in, in, in, out, in, out) is multi.
% foldr(Pred, String, !Acc):
% As foldl/4, except that processing proceeds right-to-left.
%
:- func foldr(func(char, T) = T, string, T) = T.
:- pred foldr(pred(char, T, T), string, T, T).
:- mode foldr(in(pred(in, in, out) is det), in, in, out) is det.
:- mode foldr(in(pred(in, di, uo) is det), in, di, uo) is det.
:- mode foldr(in(pred(in, in, out) is semidet), in, in, out) is semidet.
:- mode foldr(in(pred(in, in, out) is nondet), in, in, out) is nondet.
:- mode foldr(in(pred(in, in, out) is multi), in, in, out) is multi.
% foldr_between(Pred, String, Start, End, !Acc)
% is equivalent to foldr(Pred, SubString, !Acc)
% where SubString = between(String, Start, End).
%
% Start and End are in terms of code units.
%
:- func foldr_between(func(char, T) = T, string, int, int, T) = T.
:- pred foldr_between(pred(char, T, T), string, int, int, T, T).
:- mode foldr_between(in(pred(in, in, out) is det), in, in, in,
in, out) is det.
:- mode foldr_between(in(pred(in, di, uo) is det), in, in, in,
di, uo) is det.
:- mode foldr_between(in(pred(in, in, out) is semidet), in, in, in,
in, out) is semidet.
:- mode foldr_between(in(pred(in, in, out) is nondet), in, in, in,
in, out) is nondet.
:- mode foldr_between(in(pred(in, in, out) is multi), in, in, in,
in, out) is multi.
%--------------------------------------------------%
%
% Formatting tables.
%
:- type justified_column
---> left(list(string))
; right(list(string)).
% format_table(Columns, Separator) = Table:
%
% This function takes a list of columns and a column separator,
% and returns a formatted table, where
%
% - the N'th line contains the N'th string in each column;
% - that string will be padded to the width of the widest string
% in that column;
% - each field will be left justfied within that width if the column
% has a "left()" wrapper, and right justified if it has a "right()"
% wrapper;
% - the fields on each line are separated with Separator;
% - successive lines are separated by newlines.
%
% There won't be a newline at the end of Table, to allow callers to decide
% whether they want to add one or not.
%
% This predicate considers the length of a string to be the number of
% code points in the string. Note that this is only an approximation:
% it will be inaccurate in the presence of e.g. combining characters.
%
% This predicate requires all the columns to contain the same number
% of strings, and throws an exception if this is not the case.
%
% An example:
%
% format_table([right(["a", "bb", "ccc"]), left(["1", "22", "333"])],
% " * ")
%
% would return the table:
%
% a * 1
% bb * 22
% ccc * 333
%
:- func format_table(list(justified_column), string) = string.
% format_table_max(Columns, Separator) does the same job as format_table,
% but allows the caller to associate a maximum width with each column.
% If some column had strings of e.g. lengths 18, 20, 35 and 45, then
% format_table would format that column as being 45 character wide
% in all rows, but if a call to format_table_max specified 30 as the
% max width of that column, then format_table_max would format that column
% as being 30 character wide in the first two rows, and would widen the
% column only when the value does not fit in the maximum, and in each case,
% it would widen the column only as much as necessary. In this example,
% the column would be 35 and 45 characters wide respectively in the
% last two rows.
%
:- func format_table_max(assoc_list(justified_column, maybe(int)), string)
= string.
%--------------------------------------------------%
%
% Converting strings to docs.
%
% Convert a string to a pretty_printer.doc for formatting.
%
:- func string_to_doc(string) = pretty_printer.doc.
:- pragma obsolete(func(string_to_doc/1), [pretty_printer.string_to_doc/1]).
%--------------------------------------------------%
%
% Converting strings to values of builtin types.
%
% Convert a string to an int. The string must contain only digits [0-9],
% optionally preceded by a plus or minus sign. If the string does
% not match this syntax or the number is not in the range
% [min_int + 1, max_int], to_int fails.
%
:- pred to_int(string::in, int::out) is semidet.
% Convert a signed base 10 string to an int. Throws an exception if the
% string argument does not match the regexp [+-]?[0-9]+ or the number is
% not in the range [min_int + 1, max_int].
%
:- func det_to_int(string) = int.
% Convert a string in the specified base (2-36) to an int. The string
% must contain one or more digits in the specified base, optionally
% preceded by a plus or minus sign. For bases > 10, digits 10 to 35
% are represented by the letters A-Z or a-z. If the string does not match
% this syntax or the number is not in the range [min_int, max_int],
% the predicate fails.
%
:- pred base_string_to_int(int::in, string::in, int::out) is semidet.
% Convert a signed base N string to an int. Throws an exception
% if the string argument is not precisely an optional sign followed by
% a non-empty string of base N digits, or if the number is not in
% the range [min_int, max_int].
%
:- func det_base_string_to_int(int, string) = int.
%--------------------------------------------------%
% Convert a string to a uint. The string must contain only digits [0-9].
% If the string does not match this syntax or the number is not
% in the range [0, max_uint], to_uint fails.
%
:- pred to_uint(string::in, uint::out) is semidet.
% Convert a signed base 10 string to a uint. Throws an exception if the
% string argument does not match the regexp [0-9]+ or the number is
% not in the range [0, max_uint].
%
:- func det_to_uint(string) = uint.
% Convert a string in the specified base (2-36) to a uint. The string
% must contain one or more digits in the specified base. For bases > 10,
% digits 10 to 35 are represented by the letters A-Z or a-z. If the string
% does not match this syntax or the number is not in the range
% [0, max_uint], the predicate fails.
%
:- pred base_string_to_uint(int::in, string::in, uint::out) is semidet.
% Convert a signed base N string to a uint. Throws an exception
% if the string argument is not precisely a non-empty string of base N
% digits, or if the number is not in the range [0, max_uint].
%
:- func det_base_string_to_uint(int, string) = uint.
%--------------------------------------------------%
% Convert a string to a float, returning infinity or -infinity if the
% conversion overflows. Fails if the string is not a syntactically correct
% float literal.
%
:- pred to_float(string::in, float::out) is semidet.
% Convert a string to a float, returning infinity or -infinity if the
% conversion overflows. Throws an exception if the string is not a
% syntactically correct float literal.
%
:- func det_to_float(string) = float.
%--------------------------------------------------%
%
% Converting values of builtin types to strings.
%
%--------------------------------------------------%
% Converting chars to strings.
% char_to_string(Char, String):
%
% Converts a character to a string, or vice versa.
% True if String is the well-formed string that encodes the code point
% Char; or, if strings are UTF-16 encoded, Char is a surrogate code
% point and String is the string that contains only that surrogate code
% point. Otherwise, `char_to_string(Char, String)' is false.
%
% Throws an exception if Char is the null character or a code point that
% cannot be encoded in a string (namely, surrogate code points cannot be
% encoded in UTF-8 strings).
%
:- func char_to_string(char::in) = (string::uo) is det.
:- pred char_to_string(char, string).
:- mode char_to_string(in, uo) is det.
:- mode char_to_string(out, in) is semidet.
% A synonym for char_to_string/1.
%
:- func from_char(char::in) = (string::uo) is det.
%--------------------------------------------------%
% Converting integers to strings.
% The more complex conversions that build on the simpler conversions below.
% int_to_base_string(Int, Base, String):
%
% Convert an integer to a string in a given Base.
% String will consist of a minus sign (U+002D HYPHEN-MINUS)
% if Int is negative, followed by one or more decimal digits (0-9)
% or uppercase letters (A-Z). There will be no leading zeros.
%
% Base must be between 2 and 36, both inclusive; if it is not,
% the predicate will throw an exception.
%
:- func int_to_base_string(int::in, int::in) = (string::uo) is det.
:- pred int_to_base_string(int::in, int::in, string::uo) is det.
% Convert an integer to a string in base 10 with commas as thousand
% separators.
%
:- func int_to_string_thousands(int::in) = (string::uo) is det.
% int_to_base_string_group(Int, Base, GroupLength, Separator, String):
%
% Convert an integer to a string in a given Base,
% in the same format as int_to_base_string,
% with Separator inserted between every GroupLength digits
% (grouping from the end of the string).
% If GroupLength is less than one, no separators will appear
% in the output. Useful for formatting numbers like "1,300,000".
%
% Base must be between 2 and 36, both inclusive; if it is not,
% the predicate will throw an exception.
%
:- func int_to_base_string_group(int, int, int, string) = string.
:- mode int_to_base_string_group(in, in, in, in) = uo is det.
% The simpler conversions.
% Convert an integer to a string in base 10.
% See int_to_base_string for the string format.
%
:- func int_to_string(int::in) = (string::uo) is det.
:- pred int_to_string(int::in, string::uo) is det.
% A synonym for int_to_string/1.
%
:- func from_int(int::in) = (string::uo) is det.
% Convert an unsigned integer to a string in base 10.
%
:- func uint_to_string(uint::in) = (string::uo) is det.
% Convert an unsigned integer to a string in base 16.
% Alphabetic digits will be lowercase (e.g. a-f).
%
:- func uint_to_hex_string(uint::in) = (string::uo) is det.
:- func uint_to_lc_hex_string(uint::in) = (string::uo) is det.
% Convert an unsigned integer to a string in base 16.
% Alphabetic digits will be uppercase (e.g. A-F).
%
:- func uint_to_uc_hex_string(uint::in) = (string::uo) is det.
% Convert an unsigned integer to a string in base 8.
%
:- func uint_to_octal_string(uint::in) = (string::uo) is det.
% Convert a signed/unsigned 8/16/32/64 bit integer to a string.
%
:- func int8_to_string(int8::in) = (string::uo) is det.
:- func uint8_to_string(uint8::in) = (string::uo) is det.
:- func int16_to_string(int16::in) = (string::uo) is det.
:- func uint16_to_string(uint16::in) = (string::uo) is det.
:- func int32_to_string(int32::in) = (string::uo) is det.
:- func uint32_to_string(uint32::in) = (string::uo) is det.
:- func int64_to_string(int64::in) = (string::uo) is det.
:- func uint64_to_string(uint64::in) = (string::uo) is det.
% Convert an unsigned 64-bit integer to a string in base 16.
% Alphabetic digits will be lowercase (e.g. a-f).
%
:- func uint64_to_hex_string(uint64::in) = (string::uo) is det.
:- func uint64_to_lc_hex_string(uint64::in) = (string::uo) is det.
% Convert an unsigned 64-bit integer to a string in base 16.
% Alphabetic digits will be uppercase (e.g. A-F).
%
:- func uint64_to_uc_hex_string(uint64::in) = (string::uo) is det.
% Convert an unsigned 64-bit integer to a string in base 8.
%
:- func uint64_to_octal_string(uint64::in) = (string::uo) is det.
%--------------------------------------------------%
% Converting floats to strings.
% Convert a float to a string.
% In the current implementation, the resulting float will be in the form
% that it was printed using the format string "%#.<prec>g".
% <prec> will be in the range p to (p+2)
% where p = floor(mantissa_digits * log2(base_radix) / log2(10)).
% The precision chosen from this range will be such as to allow
% a successful decimal -> binary conversion of the float.
%
:- func float_to_string(float::in) = (string::uo) is det.
:- pred float_to_string(float::in, string::uo) is det.
% A synonym for float_to_string/1.
%
:- func from_float(float::in) = (string::uo) is det.
%--------------------------------------------------%
% Converting c_pointers to strings.
% Convert a c_pointer to a string. The format is "c_pointer(0xXXXX)"
% where XXXX is the hexadecimal representation of the pointer.
%
:- func c_pointer_to_string(c_pointer::in) = (string::uo) is det.
:- pred c_pointer_to_string(c_pointer::in, string::uo) is det.
% A synonym for c_pointer_to_string/1.
%
:- func from_c_pointer(c_pointer::in) = (string::uo) is det.
%--------------------------------------------------%
%
% Converting values of arbitrary types to strings.
%
% string(X): Returns a canonicalized string representation of the value X
% using the standard Mercury operators.
%
:- func string(T) = string.
% As above, but using the supplied table of operators.
%
:- func string_ops(ops.table, T) = string.
% string_ops_noncanon(NonCanon, OpTable, X, String)
%
% As above, but the caller specifies what behaviour should occur for
% non-canonical terms (i.e. terms where multiple representations
% may compare as equal):
%
% - `do_not_allow' will throw an exception if (any subterm of)
% the argument is not canonical;
% - `canonicalize' will substitute a string indicating the presence
% of a non-canonical subterm;
% - `include_details_cc' will show the structure of any non-canonical
% subterms, but can only be called from a committed choice context.
%
:- pred string_ops_noncanon(noncanon_handling, ops.table, T, string).
:- mode string_ops_noncanon(in(do_not_allow), in, in, out) is det.
:- mode string_ops_noncanon(in(canonicalize), in, in, out) is det.
:- mode string_ops_noncanon(in(include_details_cc), in, in, out) is cc_multi.
:- mode string_ops_noncanon(in, in, in, out) is cc_multi.
%--------------------------------------------------%
%
% Converting values to strings based on a format string.
%
:- type poly_type
---> f(float)
; i(int)
; i8(int8)
; i16(int16)
; i32(int32)
; i64(int64)
; u(uint)
; u8(uint8)
; u16(uint16)
; u32(uint32)
; u64(uint64)
; s(string)
; c(char).
% A function similar to sprintf() in C.
%
% For example,
% format("%s %i %c %f\n",
% [s("Square-root of"), i(2), c('='), f(1.41)], String)
% will return
% String = "Square-root of 2 = 1.41\n".
%
% The following options available in C are supported: flags [0+-# ],
% a field width (or *), and a precision (could be a ".*").
%
% Valid conversion character types are {dioxXucsfeEgGp%}. %n is not
% supported. format will not return the length of the string.
%
% conv var output form. effect of '#'.
% char. type(s).
%
% d int signed integer
% i int signed integer
% o int, uint unsigned octal with '0' prefix
% x,X int, uint unsigned hex with '0x', '0X' prefix
% u int, uint unsigned integer
% c char character
% s string string
% f float rational number with '.', if precision 0
% e,E float [-]m.dddddE+-xx with '.', if precision 0
% g,G float either e or f with trailing zeros.
% p int, uint integer
%
% The valid conversion characters for int8, int16, int32 and int64
% are the same as for int, and the valid conversion characters for
% uint8, uint16, uint32 and uint64 are the same as for uint.
%
% An option of zero will cause any padding to be zeros rather than spaces.
% A '-' will cause the output to be left-justified in its 'space'.
% (Without a `-', the default is for fields to be right-justified.)
% A '+' forces a sign to be printed. This is not sensible for string
% and character output. A ' ' causes a space to be printed before a thing
% if there is no sign there. The other option is the '#', which modifies
% the output string's format. These options are normally put directly
% after the '%'.
%
% Notes:
%
% %#.0e, %#.0E now prints a '.' before the 'e'.
%
% Asking for more precision than a float actually has will result in
% potentially misleading output.
%
% Numbers are now rounded by precision value, not truncated as previously.
%
% The implementation uses the sprintf() function in C grades,
% so the actual output will depend on the C standard library.
%
:- func format(string, list(poly_type)) = string.
:- pred format(string::in, list(poly_type)::in, string::out) is det.
%--------------------------------------------------%
%--------------------------------------------------%
Next: table_statistics, Previous: string.builder, Up: Top [Contents]