Next: table_statistics, Previous: string.builder, Up: Top [Contents]
%--------------------------------------------------% % vim: ts=4 sw=4 et ft=mercury %--------------------------------------------------% % Copyright (C) 1993-2012 The University of Melbourne. % Copyright (C) 2013-2024 The Mercury team. % This file is distributed under the terms specified in COPYING.LIB. %--------------------------------------------------% % % File: string.m. % Main authors: fjh, petdr, wangp. % Stability: medium to high. % % This module provides basic string handling facilities. % % Mercury strings are Unicode strings. They use either the UTF-8 or UTF-16 % encoding, depending on the target language. % % When Mercury is compiled to C, strings are UTF-8 encoded, with a null % character as the string terminator. A single code point requires one to four % bytes (code units) to encode. % % When Mercury is compiled to Java, strings are represented using Java's % String type. When Mercury is compiled to C#, strings are represented using % C#'s `System.String' type. Both of these types use the UTF-16 encoding. % A single code point requires one or two 16-bit integers (code units) % to encode. % % The Mercury compiler will only allow well-formed UTF-8 or UTF-16 string % constants. However, it is possible to produce strings containing invalid % UTF-8 or UTF-16 via I/O, foreign code, and substring operations. % Predicates or functions that inspect strings may fail, throw an exception, % or else behave in some special way when they encounter an ill-formed % code unit sequence. % % Unexpected null characters embedded in the middle of strings can be a source % of security vulnerabilities, so the Mercury library predicates and functions % which create strings from (lists of) characters throw an exception if they % detect such a null character. Programmers must not create strings that might % contain null characters using the foreign language interface. % % The builtin comparison operation on strings is also implementation % dependent. The current implementation performs string comparison using % % - C's strcmp() function, when compiling to C; % - Java's String.compareTo() method, when compiling to Java; and % - C#'s System.String.CompareOrdinal() method, when compiling to C#. % %--------------------------------------------------% % % This module is divided into several sections. These sections are: % % - Wrapper types that associate particular semantics with raw strings. % - Converting between strings and lists of characters. % - Reading characters from strings. % - Writing characters to strings. % - Determining the lengths of strings. % - Computing hashes of strings. % - Tests on strings. % - Appending strings. % - Splitting up strings. % - Dealing with prefixes and suffixes. % - Transformations of strings. % - Folds over the characters in strings. % - Formatting tables. % - Converting strings to docs. % - Converting strings to values of builtin types. % - Converting values of builtin types to strings. % - Converting values of arbitrary types to strings. % - Converting values to strings based on a format string. % %--------------------------------------------------% :- module string. :- interface. :- include_module builder. :- import_module assoc_list. :- import_module char. :- import_module deconstruct. :- import_module list. :- import_module maybe. :- import_module ops. :- import_module pretty_printer. %--------------------------------------------------% % % Wrapper types that associate particular semantics with raw strings. % % These types are used for defining stream typeclass instances % where you want different instances for strings representing different % semantic entities. Using the string type itself, without a wrapper, % would be ambiguous in such situations. % % While each module that associates semantics with strings could define % its own wrapper types, the notions of lines and text files are so common % that it is simpler to define them just once, and this is the logical % place to do that. % % A line is: % % - a possibly empty sequence of non-newline characters terminated by a % newline character; or % - a non-empty sequence of non-newline characters terminated by the end % of the file. % :- type line ---> line(string). % A text file is a possibly empty sequence of characters % terminated by the end of the file. % :- type text_file ---> text_file(string). %--------------------------------------------------% :- type string_encoding ---> utf8 ; utf16. % Return the internal string encoding on the current platform. % :- func internal_string_encoding = string_encoding. %--------------------------------------------------% % % Conversions between strings and lists of characters. % % Convert the string to a list of characters (code points). % % If strings use UTF-8 encoding, then each code unit in an ill-formed % sequence is replaced by U+FFFD REPLACEMENT CHARACTER in the list. % If strings use UTF-16 encoding, then each unpaired surrogate code point % is returned as a separate code point in the list. % :- func to_char_list(string) = list(char). :- pred to_char_list(string::in, list(char)::out) is det. % Convert the string to a list of characters (code points) in reverse % order. % % If strings use UTF-8 encoding, then each code unit in an ill-formed % sequence is replaced by U+FFFD REPLACEMENT CHARACTER in the list. % If strings use UTF-16 encoding, then each unpaired surrogate code point % is returned as a separate code point in the list. % :- func to_rev_char_list(string) = list(char). :- pred to_rev_char_list(string::in, list(char)::out) is det. % Convert a list of characters (code points) to a string. % Throws an exception if the list contains a null character or code point % that cannot be encoded in a string (namely, surrogate code points cannot % be encoded in UTF-8 strings). % :- func from_char_list(list(char)::in) = (string::uo) is det. :- pred from_char_list(list(char)::in, string::uo) is det. % As above, but fail instead of throwing an exception if the list contains % a null character or code point that cannot be encoded in a string. % :- pred semidet_from_char_list(list(char)::in, string::uo) is semidet. % Same as from_char_list, except that it reverses the order % of the characters. % Throws an exception if the list contains a null character or code point % that cannot be encoded in a string (namely, surrogate code points cannot % be encoded in UTF-8 strings). % :- func from_rev_char_list(list(char)::in) = (string::uo) is det. :- pred from_rev_char_list(list(char)::in, string::uo) is det. % As above, but fail instead of throwing an exception if the list contains % a null character or code point that cannot be encoded in a string. % :- pred semidet_from_rev_char_list(list(char)::in, string::uo) is semidet. % Convert a string into a list of code units of the string encoding used % by the current process. % :- pred to_code_unit_list(string::in, list(int)::out) is det. % Convert a string into a list of UTF-8 code units. % Throws an exception if the string contains an unpaired surrogate code % point, as the encoding of surrogate code points is prohibited in UTF-8. % :- pred to_utf8_code_unit_list(string::in, list(int)::out) is det. % Convert a string into a list of UTF-16 code units. % Throws an exception if strings use UTF-8 encoding and the given string % contains an ill-formed code unit sequence, as arbitrary bytes cannot be % represented in UTF-16 (even allowing for ill-formed sequences). % :- pred to_utf16_code_unit_list(string::in, list(int)::out) is det. % Convert a list of code units to a string. % Fails if the list does not contain a valid encoding of a string % (in the encoding expected by the current process), % or if the string would contain a null character. % :- pred from_code_unit_list(list(int)::in, string::uo) is semidet. % Convert a list of code units to a string. % The resulting string may contain ill-formed sequences. % Fails if the list contains a code unit that is out of range % or if the string would contain a null character. % :- pred from_code_unit_list_allow_ill_formed(list(int)::in, string::uo) is semidet. % Convert a list of UTF-8 code units to a string. % Fails if the list does not contain a valid encoding of a string % or if the string would contain a null character. % :- pred from_utf8_code_unit_list(list(int)::in, string::uo) is semidet. % Convert a list of UTF-16 code units to a string. % Fails if the list does not contain a valid encoding of a string % or if the string would contain a null character. % :- pred from_utf16_code_unit_list(list(int)::in, string::uo) is semidet. % duplicate_char(Char, Count, String): % % Construct a string consisting of Count occurrences of Char code points % in sequence, returning the empty string if Count is less than or equal % to zero. Throws an exception if Char is a null character or code point % that cannot be encoded in a string (namely, surrogate code points cannot % be encoded in UTF-8 strings). % :- func duplicate_char(char::in, int::in) = (string::uo) is det. :- pred duplicate_char(char::in, int::in, string::uo) is det. %--------------------------------------------------% % % Reading characters from strings. % % This type is used by the _repl indexing predicates to distinguish a % U+FFFD code point that is actually in a string from a U+FFFD code point % generated when the predicate encounters an ill-formed code unit sequence % in a UTF-8 string. % :- type maybe_replaced ---> not_replaced ; replaced_code_unit(uint8). % index(String, Index, Char): % % If Index is the initial code unit offset of a well-formed code unit % sequence in String then Char is the code point encoded by that % sequence. % % Otherwise, if Index is in range, Char is either a U+FFFD REPLACEMENT % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate % code point at Index (when strings are UTF-16 encoded). % % Fails if Index is out of range (negative, or greater than or equal to % the length of String). % :- pred index(string::in, int::in, char::uo) is semidet. % det_index(String, Index, Char): % % Like index/3 but throws an exception if Index is out of range % (negative, or greater than or equal to the length of String). % :- func det_index(string, int) = char. :- pred det_index(string::in, int::in, char::uo) is det. % unsafe_index(String, Index, Char): % % Like index/3 but does not check that Index is in range. % % WARNING: behavior is UNDEFINED if Index is out of range % (negative, or greater than or equal to the length of String). % This version is constant time, whereas det_index % may be linear in the length of the string. Use with care! % :- func unsafe_index(string, int) = char. :- pred unsafe_index(string::in, int::in, char::uo) is det. % A synonym for det_index/2: % String ^ elem(Index) = det_index(String, Index). % :- func string ^ elem(int) = char. % A synonym for unsafe_index/2: % String ^ unsafe_elem(Index) = unsafe_index(String, Index). % :- func string ^ unsafe_elem(int) = char. % index_next(String, Index, NextIndex, Char): % % Succeeds if and only if Index is between 0 and Len-1 (both inclusive) % where Len is the number of code units in String. % % If Index is the initial code unit offset of a well-formed code unit % sequence in String, then Char will be set to the code point encoded % by that sequence, and NextIndex will be set to the offset of the code % unit immediately following that sequence. % % If Index is *not* the initial code unit offset of a well-formed % code unit sequence, NextIndex will be set to Index + 1, but the value % of Char will depend on string encoding used by the target platform. % % - On platforms that encode strings using UTF-8 (i.e. when targeting C) % Char will be set to U+FFFD (the Unicode replacement character). % % - On platforms that encode strings using UTF-16 (i.e. when targeting % C# or Java), Char will be set to the unpaired surrogate code point % at Index. (For more details, see the comment just below.) % :- pred index_next(string::in, int::in, int::out, char::uo) is semidet. % index_next_repl(String, Index, NextIndex, Char, MaybeReplaced): % % Does the same job as index_next/4 but on success, it also returns % MaybeReplaced, which will specify whether Char is the result % of the replacement of a non-well-formed UTF-8 character with U+FFFD. % % On platforms that encode strings using UTF-8 (i.e. when targeting C), % there are three cases. % % - If Char is not U+FFFD, then MaybeReplaced will be `not_replaced'. % % - If Char is U+FFFD because there is a well-formed code point encoded % in String starting at Index, and that code point is U+FFFD, then % MaybeReplaced will also be `not_replaced'. % % - If Char is U+FFFD but there is *no* well formed code point encoded % in String starting at Index, then MaybeReplaced will be % `replaced_code_unit(CodeUnit)', where CodeUnit is the code unit % at offset Index in String. % % On platforms that encode strings using UTF-16 (i.e. when targeting C# % or Java), MaybeReplaced will always be bound to `not_replaced'. % The only ways that a UTF-16 string may be non-well-formed are % % - by having a high surrogate code unit (between 0xD800 and 0xDBFF) % that is not immediately followed by a low surrogate code unit % (between 0xDC00 and 0xDFFF), or % % - by having a low surrogate code unit that is not immediately preceded % by a high surrogate code unit. % % In both cases, index_next_repl will return the unpaired surrogate % unchanged as Char. There is no replacement required, because % % - surrogate code units are all the range 0xD800 to 0xDFFF, and % - the Unicode standard deliberately does not assign any characters % to the code points in this range. % % This means that if Char is in this range, then it must be an unpaired % surrogate, but since Char actually appears in String, it won't be % a *replacement* of another character. % :- pred index_next_repl(string::in, int::in, int::out, char::uo, maybe_replaced::out) is semidet. % unsafe_index_next(String, Index, NextIndex, Char): % % Like index_next/4 but does not check that Index is in range. % Fails if Index is equal to the length of String. % % WARNING: behavior is UNDEFINED if Index is out of range % (negative, or greater than the length of String). % :- pred unsafe_index_next(string::in, int::in, int::out, char::uo) is semidet. % unsafe_index_next_repl(String, Index, NextIndex, Char, MaybeReplaced): % % Like index_next_repl/5 but does not check that Index is in range. % Fails if Index is equal to the length of String. % % WARNING: behavior is UNDEFINED if Index is out of range % (negative, or greater than the length of String). % :- pred unsafe_index_next_repl(string::in, int::in, int::out, char::uo, maybe_replaced::out) is semidet. % prev_index(String, Index, PrevIndex, Char): % % If Index - 1 is the final code unit offset of a well-formed sequence in % String then Char is the code point encoded by that sequence, and % PrevIndex is the initial code unit offset of that sequence. % % Otherwise, if Index is in range, Char is either a U+FFFD REPLACEMENT % CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate % code point at Index - 1 (when strings are UTF-16 encoded), and % PrevIndex is Index - 1. % % Fails if Index is out of range (non-positive, or greater than the % length of String). % :- pred prev_index(string::in, int::in, int::out, char::uo) is semidet. % prev_index_repl(String, Index, PrevIndex, Char, MaybeReplaced): % % Like prev_index/4 but also returns MaybeReplaced on success. % When Char is not U+FFFD, then MaybeReplaced is always `not_replaced'. % When Char is U+FFFD (the Unicode replacement character), then there are % two cases: % % - If there is a U+FFFD code point encoded in String at % [PrevIndex, Index) then MaybeReplaced is `not_replaced'. % % - Otherwise, MaybeReplaced is `replaced_code_unit(CodeUnit)' where % CodeUnit is the code unit in String at Index - 1. % :- pred prev_index_repl(string::in, int::in, int::out, char::uo, maybe_replaced::out) is semidet. % unsafe_prev_index(String, Index, PrevIndex, Char): % % Like prev_index/4 but does not check that Index is in range. % Fails if Index is zero. % % WARNING: behavior is UNDEFINED if Index is out of range % (negative, or greater than the length of String). % :- pred unsafe_prev_index(string::in, int::in, int::out, char::uo) is semidet. % unsafe_prev_index_repl(String, Index, PrevIndex, Char, MaybeReplaced): % % Like prev_index_repl/5 but does not check that Index is in range. % Fails if Index is zero. % % WARNING: behavior is UNDEFINED if Index is out of range % (negative, or greater than the length of String). % :- pred unsafe_prev_index_repl(string::in, int::in, int::out, char::uo, maybe_replaced::out) is semidet. % unsafe_index_code_unit(String, Index, CodeUnit): % % CodeUnit is the code unit in String at the offset Index. % WARNING: behavior is UNDEFINED if Index is out of range % (negative, or greater than or equal to the length of String). % :- pred unsafe_index_code_unit(string::in, int::in, int::out) is det. %--------------------------------------------------% % % Writing characters to strings. % % set_char(Char, Index, String0, String): % % String is String0, with the code unit sequence beginning at Index % replaced by the encoding of Char. If the code unit at Index is the % initial code unit in a valid encoding of a code point, then that entire % code unit sequence is replaced. Otherwise, only the code unit at Index % is replaced. % % Fails if Index is out of range (negative, or greater than or equal to % the length of String0). % % Throws an exception if Char is the null character or a code point that % cannot be encoded in a string (namely, surrogate code points cannot be % encoded in UTF-8 strings). % :- pred set_char(char, int, string, string). :- mode set_char(in, in, in, out) is semidet. % NOTE This mode is disabled because the compiler puts constant strings % into static data even when they might be updated. % :- mode set_char(in, in, di, uo) is semidet. % det_set_char(Char, Index, String0, String): % % Same as set_char/4 but throws an exception if Index is out of range % (negative, or greater than or equal to the length of String0). % :- func det_set_char(char, int, string) = string. :- pred det_set_char(char, int, string, string). :- mode det_set_char(in, in, in, out) is det. % NOTE This mode is disabled because the compiler puts constant strings % into static data even when they might be updated. % :- mode det_set_char(in, in, di, uo) is det. % unsafe_set_char(Char, Index, String0, String): % % Same as set_char/4 but does not check if Index is in range. % WARNING: behavior is UNDEFINED if Index is out of range % (negative, or greater than or equal to the length of String0). % Use with care! % :- func unsafe_set_char(char, int, string) = string. :- mode unsafe_set_char(in, in, in) = out is det. % NOTE This mode is disabled because the compiler puts constant strings % into static data even when they might be updated. % :- mode unsafe_set_char(in, in, di) = uo is det. :- pred unsafe_set_char(char, int, string, string). :- mode unsafe_set_char(in, in, in, out) is det. % NOTE This mode is disabled because the compiler puts constant strings % into static data even when they might be updated. % :- mode unsafe_set_char(in, in, di, uo) is det. %--------------------------------------------------% % % Determining the lengths of strings. % % Determine the length of a string, in code units. % An empty string has length zero. % % NOTE: code points (characters) are encoded using one or more code units, % i.e. bytes for UTF-8; 16-bit integers for UTF-16. % :- func length(string::in) = (int::uo) is det. :- pred length(string, int). :- mode length(in, uo) is det. :- mode length(ui, uo) is det. % Synonyms for length. % :- func count_code_units(string) = int. :- pred count_code_units(string::in, int::out) is det. % Determine the number of code points in a string. % % Each valid code point, and each code unit that is part of an ill-formed % sequence, contributes one to the result. % (This matches the number of steps it would take to iterate over the % string using string.index_next or string.prev_index.) % :- func count_code_points(string) = int. :- pred count_code_points(string::in, int::out) is det. % NOTE We are changing all occurrences of "codepoint" in the % names of predicates and functions to "code_point", for consistency % with predicate and function names that talk about code_units. % :- func count_codepoints(string) = int. :- pred count_codepoints(string::in, int::out) is det. :- pragma obsolete(func(count_codepoints/1), [count_code_points/1]). :- pragma obsolete(pred(count_codepoints/2), [count_code_points/2]). % count_utf8_code_units(String) = Length: % % Return the number of code units required to represent a string in % UTF-8 encoding (with allowance for ill-formed sequences). % Equivalent to Length = length(to_utf8_code_unit_list(String)). % % Throws an exception if strings use UTF-16 encoding but the given string % contains an unpaired surrogate code point. Surrogate code points cannot % be represented in UTF-8. % :- func count_utf8_code_units(string) = int. % code_point_offset(String, StartOffset, Count, Offset): % % Let S be the substring of String from code unit StartOffset to the % end of the string. Offset is code unit offset after advancing Count % steps in S, where each step skips over either: % - one encoding of a Unicode code point, or % - one code unit that is part of an ill-formed sequence. % % Fails if StartOffset is out of range (negative, or greater than the % length of String), or if there are fewer than Count steps possible in S. % :- pred code_point_offset(string::in, int::in, int::in, int::out) is semidet. % NOTE We are changing all occurrences of "codepoint" in the % names of predicates and functions to "code_point", for consistency % with predicate and function names that talk about code_units. % :- pred codepoint_offset(string::in, int::in, int::in, int::out) is semidet. :- pragma obsolete(pred(codepoint_offset/4), [code_point_offset/4]). % code_point_offset(String, Count, Offset): % % Same as `code_point_offset(String, 0, Count, Offset)'. % :- pred code_point_offset(string::in, int::in, int::out) is semidet. % NOTE We are changing all occurrences of "codepoint" in the % names of predicates and functions to "code_point", for consistency % with predicate and function names that talk about code_units. % :- pred codepoint_offset(string::in, int::in, int::out) is semidet. :- pragma obsolete(pred(codepoint_offset/3), [code_point_offset/3]). %--------------------------------------------------% % % Computing hashes of strings. % % Compute a hash value for a string. % :- func hash(string) = int. :- pred hash(string::in, int::out) is det. % Two other hash functions for strings. % :- func hash2(string) = int. :- func hash3(string) = int. % Cross-compilation-friendly versions of hash, hash2 and hash3 % respectively. :- func hash4(string) = int. :- func hash5(string) = int. :- func hash6(string) = int. %--------------------------------------------------% % % Tests on strings. % % True if string is the empty string. % :- pred is_empty(string::in) is semidet. % True if the string is a valid UTF-8 or UTF-16 string. % In target languages that use UTF-8 string encoding, `is_well_formed(S)' % is true iff S consists of a well-formed UTF-8 code unit sequence. % In target languages that use UTF-16 string encoding, `is_well_formed(S)' % is true iff S consists of a well-formed UTF-16 code unit sequence. % :- pred is_well_formed(string::in) is semidet. % Values of this type record whether a string is well or ill formed. % In the latter case, the integer gives the offset in the string % (as a count of either UTF-8 or UTF-16 code units, depending on the % target language) of the first position at which the string departs % from well-formedness. % :- type well_or_ill_formed ---> well_formed ; ill_formed(int). % Does the same job as is_well_formed, but if the string is NOT well % formed, it will return the offset (as a count of code units) of the % first position at which the string departs from well-formedness. % :- pred check_well_formedness(string::in, well_or_ill_formed::out) is det. % True if string contains only alphabetic characters [A-Za-z]. % :- pred is_all_alpha(string::in) is semidet. % True if string contains only alphabetic characters [A-Za-z] and digits % [0-9]. % :- pred is_all_alnum(string::in) is semidet. % True if string contains only alphabetic characters [A-Za-z] and % underscores. % :- pred is_all_alpha_or_underscore(string::in) is semidet. % True if string contains only alphabetic characters [A-Za-z], % digits [0-9], and underscores. % :- pred is_all_alnum_or_underscore(string::in) is semidet. % True if the string contains only decimal digits (0-9). % :- pred is_all_digits(string::in) is semidet. % all_match(TestPred, String): % % True iff all code points in String satisfy TestPred, and String contains % no ill-formed code unit sequences. % :- pred all_match(pred(char)::in(pred(in) is semidet), string::in) is semidet. % contains_match(TestPred, String): % % True iff String contains at least one code point that satisfies % TestPred. Any ill-formed code unit sequences in String are ignored % as they do not encode code points. % :- pred contains_match(pred(char)::in(pred(in) is semidet), string::in) is semidet. % contains_char(String, Char): % % Succeed if the code point Char occurs in String. % Any ill-formed code unit sequences within String are ignored % as they will not contain Char. % :- pred contains_char(string::in, char::in) is semidet. % compare_substrings(Res, X, StartX, Y, StartY, Length): % % Compare two substrings by code unit order. The two substrings are % the substring of X between StartX and StartX + Length, and % the substring of Y between StartY and StartY + Length. % StartX, StartY and Length are all in terms of code units. % % Fails if StartX or StartX + Length are not within [0, length(X)], % or if StartY or StartY + Length are not within [0, length(Y)], % or if Length is negative. % :- pred compare_substrings(comparison_result::uo, string::in, int::in, string::in, int::in, int::in) is semidet. % unsafe_compare_substrings(Res, X, StartX, Y, StartY, Length): % % Same as compare_between/4 but without range checks. % WARNING: if any of StartX, StartY, StartX + Length or % StartY + Length are out of range, or if Length is negative, % then the behaviour is UNDEFINED. Use with care! % :- pred unsafe_compare_substrings(comparison_result::uo, string::in, int::in, string::in, int::in, int::in) is det. % compare_ignore_case_ascii(Res, X, Y): % % Compare two strings by code unit order, ignoring the case of letters % (A-Z, a-z) in the ASCII range. % Equivalent to `compare(Res, to_lower(X), to_lower(Y))' % but more efficient. % :- pred compare_ignore_case_ascii(comparison_result::uo, string::in, string::in) is det. % prefix_length(Pred, String): % % The length (in code units) of the maximal prefix of String consisting % entirely of code points satisfying Pred. % :- func prefix_length(pred(char)::in(pred(in) is semidet), string::in) = (int::out) is det. % suffix_length(Pred, String): % % The length (in code units) of the maximal suffix of String consisting % entirely of code points satisfying Pred. % :- func suffix_length(pred(char)::in(pred(in) is semidet), string::in) = (int::out) is det. % sub_string_search(String, SubString, Index): % % Index is the code unit position in String where the first % occurrence of SubString begins. Indices start at zero, so if % SubString is a prefix of String, this will return Index = 0. % :- pred sub_string_search(string::in, string::in, int::out) is semidet. % sub_string_search_start(String, SubString, BeginAt, Index): % % Index is the code unit position in String where the first % occurrence of SubString occurs such that 'Index' is greater than or % equal to BeginAt. Indices start at zero. % Fails if either BeginAt is negative, or greater than % length(String) - length(SubString). % :- pred sub_string_search_start(string::in, string::in, int::in, int::out) is semidet. % unsafe_sub_string_search_start(String, SubString, BeginAt, Index): % % Same as sub_string_search_start/4 but does not check that BeginAt % is in range. % WARNING: if BeginAt is either negative, or greater than length(String), % then the behaviour is UNDEFINED. Use with care! % :- pred unsafe_sub_string_search_start(string::in, string::in, int::in, int::out) is semidet. % find_first_char(String, Char, Index): % % Find the first occurrence of the code point Char in String. % On success, Index is the code unit offset of that code point. % :- pred find_first_char(string::in, char::in, int::out) is semidet. % find_first_char_start(String, Char, BeginAt, Index): % % Find the first occurrence of the code point Char in String, % beginning from the code unit offset BeginAt in String. % On success, Index is the code unit offset of that code point. % % Fails if BeginAt is out of range (negative, or greater than or equal % to the length of String). % :- pred find_first_char_start(string::in, char::in, int::in, int::out) is semidet. % unsafe_find_first_char_start(String, Char, BeginAt, Index): % % Same as find_first_char_start/4 but does not check that BeginAt % is in range. % WARNING: if BeginAt is either negative, or greater than length(String), % then the behaviour is UNDEFINED. Use with care! % :- pred unsafe_find_first_char_start(string::in, char::in, int::in, int::out) is semidet. % find_last_char(String, Char, Index): % % Find the last occurrence of the code point Char in String. % On success, Index is the code unit offset of that code point. % :- pred find_last_char(string::in, char::in, int::out) is semidet. %--------------------------------------------------% % % Appending strings. % % Append two strings together. % :- func append(string::in, string::in) = (string::uo) is det. % append(S1, S2, S3): % % Append two strings together. S3 consists of the code units of S1 % followed by the code units of S2, in order. % % An ill-formed code unit sequence at the end of S1 may join with an % ill-formed code unit sequence at the start of S2 to produce a valid % encoding of a code point in S3. % :- pred append(string, string, string). :- mode append(in, in, in) is semidet. % implied :- mode append(in, uo, in) is semidet. :- mode append(in, in, uo) is det. :- mode append(uo, in, in) is semidet. % nondet_append(S1, S2, S3): % % Non-deterministically return S1 and S2, where S1 ++ S2 = S3. % S3 is split after each code point or code unit in an ill-formed sequence. % :- pred nondet_append(string::out, string::out, string::in) is multi. % S1 ++ S2 = S :- append(S1, S2, S). % % Append two strings together using nicer inline syntax. % :- func string ++ string = string. :- mode in ++ in = uo is det. % Append a list of strings together. % :- func append_list(list(string)::in) = (string::uo) is det. :- pred append_list(list(string)::in, string::uo) is det. % join_list(Separator, Strings) = JoinedString: % % Append together the strings in Strings, putting Separator between % each pair of adjacent strings. If Strings is the empty list, % return the empty string. % :- func join_list(string::in, list(string)::in) = (string::uo) is det. %--------------------------------------------------% % % Making strings from smaller pieces. % :- type string_piece ---> string(string) ; substring(string, int, int). % string, start, end offset % append_string_pieces(Pieces, String): % % Append together the strings and substrings in Pieces into a string. % Throws an exception if Pieces contains an element % `substring(S, Start, End)' where Start or End are not within % the range [0, length(S)], or if Start > End. % :- pred append_string_pieces(list(string_piece)::in, string::uo) is det. % Same as append_string_pieces/2 but without range checks. % WARNING: if any piece `substring(S, Start, End)' has Start or End % outside the range [0, length(S)], or if Start > End, % then the behaviour is UNDEFINED. Use with care! % :- pred unsafe_append_string_pieces(list(string_piece)::in, string::uo) is det. %--------------------------------------------------% % % Splitting up strings. % % first_char(String, Char, Rest) is true iff String begins with a % well-formed code unit sequence, Char is the code point encoded by % that sequence, and Rest is the rest of String after that sequence. % % The (uo, in, in) mode throws an exception if Char cannot be encoded in % a string, or if Char is a surrogate code point (for consistency with % the other modes). % % WARNING: first_char makes a copy of Rest because the garbage collector % doesn't handle references into the middle of an object, at least not the % way we use it. This means that repeated use of first_char to iterate % over a string will result in very poor performance. If you want to % iterate over the characters in a string, use foldl or to_char_list % instead. % :- pred first_char(string, char, string). :- mode first_char(in, in, in) is semidet. % implied :- mode first_char(in, uo, in) is semidet. % implied :- mode first_char(in, in, uo) is semidet. % implied :- mode first_char(in, uo, uo) is semidet. :- mode first_char(uo, in, in) is det. % split(String, Index, LeftSubstring, RightSubstring): % % Split a string into two substrings at the code unit offset Index. % (If Index is out of the range [0, length of String], it is treated % as if it were the nearest end-point of that range.) % :- pred split(string::in, int::in, string::out, string::out) is det. % split_by_code_point(String, Count, LeftSubstring, RightSubstring): % % LeftSubstring is the left-most Count code points of String, % and RightSubstring is the remainder of String. % (If Count is out of the range [0, length of String], it is treated % as if it were the nearest end-point of that range.) % :- pred split_by_code_point(string::in, int::in, string::out, string::out) is det. % NOTE We are changing all occurrences of "codepoint" in the % names of predicates and functions to "code_point", for consistency % with predicate and function names that talk about code_units. % :- pred split_by_codepoint(string::in, int::in, string::out, string::out) is det. :- pragma obsolete(pred(split_by_codepoint/4), [split_by_code_point/4]). % left(String, Count, LeftSubstring): % % LeftSubstring is the left-most Count code units of String. % (If Count is out of the range [0, length of String], it is treated % as if it were the nearest end-point of that range.) % :- func left(string::in, int::in) = (string::out) is det. :- pred left(string::in, int::in, string::out) is det. % left_by_code_point(String, Count, LeftSubstring): % % LeftSubstring is the left-most Count code points of String. % (If Count is out of the range [0, length of String], it is treated % as if it were the nearest end-point of that range.) % :- func left_by_code_point(string::in, int::in) = (string::out) is det. :- pred left_by_code_point(string::in, int::in, string::out) is det. % NOTE We are changing all occurrences of "codepoint" in the % names of predicates and functions to "code_point", for consistency % with predicate and function names that talk about code_units. % :- func left_by_codepoint(string::in, int::in) = (string::out) is det. :- pred left_by_codepoint(string::in, int::in, string::out) is det. :- pragma obsolete(func(left_by_codepoint/2), [left_by_codepoint/2]). :- pragma obsolete(pred(left_by_codepoint/3), [left_by_codepoint/3]). % right(String, Count, RightSubstring): % % RightSubstring is the right-most Count code units of String. % (If Count is out of the range [0, length of String], it is treated % as if it were the nearest end-point of that range.) % :- func right(string::in, int::in) = (string::out) is det. :- pred right(string::in, int::in, string::out) is det. % right_by_code_point(String, Count, RightSubstring): % % RightSubstring is the right-most Count code points of String. % (If Count is out of the range [0, length of String], it is treated % as if it were the nearest end-point of that range.) % :- func right_by_code_point(string::in, int::in) = (string::out) is det. :- pred right_by_code_point(string::in, int::in, string::out) is det. % NOTE We are changing all occurrences of "codepoint" in the % names of predicates and functions to "code_point", for consistency % with predicate and function names that talk about code_units. % :- func right_by_codepoint(string::in, int::in) = (string::out) is det. :- pred right_by_codepoint(string::in, int::in, string::out) is det. :- pragma obsolete(func(right_by_codepoint/2), [right_by_codepoint/2]). :- pragma obsolete(pred(right_by_codepoint/3), [right_by_codepoint/3]). % between(String, Start, End, Substring): % % Substring consists of the segment of String within the half-open % interval [Start, End), where Start and End are code unit offsets. % (If Start is out of the range [0, length of String], it is treated % as if it were the nearest end-point of that range. % If End is out of the range [Start, length of String], % it is treated as if it were the nearest end-point of that range.) % :- func between(string::in, int::in, int::in) = (string::uo) is det. :- pred between(string::in, int::in, int::in, string::uo) is det. % between_code_points(String, Start, End, Substring): % % Substring is the part of String between the code point positions % Start and End. The result is equivalent to: % % between(String, StartOffset, EndOffset, Substring) % % where: % % StartOffset is from code_point_offset(String, Start, StartOffset) % if Start is in [0, count_code_points(String)], % StartOffset = 0 if Start < 0, % StartOffset = length(String) otherwise; % % EndOffset is from code_point_offset(String, End, EndOffset) % if End is in [0, count_code_points(String)], % EndOffset = 0 if End < 0, % EndOffset = length(String) otherwise. % % between/4 will enforce StartOffset =< EndOffset. % :- func between_code_points(string::in, int::in, int::in) = (string::uo) is det. :- pred between_code_points(string::in, int::in, int::in, string::uo) is det. % NOTE We are changing all occurrences of "codepoint" in the % names of predicates and functions to "code_point", for consistency % with predicate and function names that talk about code_units. % :- func between_codepoints(string::in, int::in, int::in) = (string::uo) is det. :- pred between_codepoints(string::in, int::in, int::in, string::uo) is det. :- pragma obsolete(func(between_codepoints/3), [between_code_points/3]). :- pragma obsolete(pred(between_codepoints/4), [between_code_points/4]). % unsafe_between(String, Start, End, Substring): % % Substring consists of the segment of String within the half-open % interval [Start, End), where Start and End are code unit offsets. % WARNING: if Start is out of the range [0, length of String] or % End is out of the range [Start, length of String] % then the behaviour is UNDEFINED. Use with care! % This version takes time proportional to the length of the substring, % whereas substring may take time proportional to the length % of the whole string. % :- func unsafe_between(string::in, int::in, int::in) = (string::uo) is det. :- pred unsafe_between(string::in, int::in, int::in, string::uo) is det. % words_separator(SepP, String) returns the list of non-empty % substrings of String (in first to last order) that are delimited % by non-empty sequences of code points matched by SepP. % For example, % % words_separator(char.is_whitespace, " the cat sat on the mat") = % ["the", "cat", "sat", "on", "the", "mat"] % % Note the difference to split_at_separator. % :- func words_separator(pred(char), string) = list(string). :- mode words_separator(in(pred(in) is semidet), in) = out is det. % words(String) = % words_separator(char.is_whitespace, String). % :- func words(string) = list(string). % split_at_separator(SepP, String) returns the list of (possibly empty) % substrings of String (in first to last order) that are delimited % by code points matched by SepP. For example, % % split_at_separator(char.is_whitespace, " the cat sat on the mat") % = ["", "the", "cat", "", "sat", "on", "the", "", "mat"] % % Note the difference to words_separator. % :- func split_at_separator(pred(char), string) = list(string). :- mode split_at_separator(in(pred(in) is semidet), in) = out is det. % split_at_char(Char, String) = % split_at_separator(unify(Char), String) % :- func split_at_char(char, string) = list(string). % split_at_string(Separator, String) returns the list of substrings % of String that are delimited by Separator. For example, % % split_at_string("|||", "|||fld2|||fld3") = ["", "fld2", "fld3"] % % Always the first match of Separator is used to break the String, for % example: split_at_string("aa", "xaaayaaaz") = ["x", "ay", "az"] % :- func split_at_string(string, string) = list(string). % split_into_lines(String) breaks String into a sequence of lines, % with each line consisting of a possibly empty sequence of non-newline % characters, followed either by a newline character, or by the end % of the string. The string returned for a line will not contain % the newline character. % :- func split_into_lines(string) = list(string). %--------------------------------------------------% % % Dealing with prefixes and suffixes. % % prefix(String, Prefix) is true iff Prefix is a prefix of String. % Same as append(Prefix, _, String). % :- pred prefix(string::in, string::in) is semidet. % suffix(String, Suffix) is true iff Suffix is a suffix of String. % Same as append(_, Suffix, String). % :- pred suffix(string::in, string::in) is semidet. % remove_prefix(Prefix, String, Suffix): % % This is a synonym for append(Prefix, Suffix, String) but with the % arguments in a more convenient order for use with higher-order code. % % WARNING: the argument order differs from remove_suffix. % :- pred remove_prefix(string::in, string::in, string::out) is semidet. % det_remove_prefix(Prefix, String, Suffix): % % This is a synonym for append(Prefix, Suffix, String) but with the % arguments in a more convenient order for use with higher-order code. % % WARNING: the argument order differs from remove_suffix. % :- pred det_remove_prefix(string::in, string::in, string::out) is det. % remove_prefix_if_present(Prefix, String) = Suffix returns String minus % Prefix if String begins with Prefix, and String if it doesn't. % :- func remove_prefix_if_present(string, string) = string. % add_prefix(Prefix, Str) = PrefixStr: % % Does the same job as Prefix ++ Str = PrefixStr, but allows % using list.map to add the same prefix to many strings. % :- func add_prefix(string, string) = string. % remove_suffix(String, Suffix, Prefix): % % The same as append(Prefix, Suffix, String). % % WARNING: the argument order differs from both remove_prefix and % remove_suffix_if_present. % :- pred remove_suffix(string::in, string::in, string::out) is semidet. % det_remove_suffix(String, Suffix) returns the same value as % remove_suffix, except it throws an exception if String does not end % with Suffix. % % WARNING: the argument order differs from both remove_prefix and % remove_suffix_if_present. % :- func det_remove_suffix(string, string) = string. % remove_suffix_if_present(Suffix, String) returns String minus Suffix % if String ends with Suffix, and String if it doesn't. % % WARNING: the argument order differs from remove_suffix and % det_remove_suffix. % :- func remove_suffix_if_present(string, string) = string. % add_suffix(Suffix, Str) = StrSuffix: % % Does the same job as Str ++ Suffix = StrSuffix, but allows % using list.map to add the same suffix to many strings. % :- func add_suffix(string, string) = string. %--------------------------------------------------% % % Transformations of strings. % % Convert the first character (if any) of a string to uppercase. % Only letters (a-z) in the ASCII range are converted. % % This function transforms the initial code point of a string, % whether or not the code point occurs as part of a combining sequence. % :- func capitalize_first(string) = string. :- pred capitalize_first(string::in, string::out) is det. % Convert the first character (if any) of a string to lowercase. % Only letters (A-Z) in the ASCII range are converted. % % This function transforms the initial code point of a string, % whether or not the code point occurs as part of a combining sequence. % :- func uncapitalize_first(string) = string. :- pred uncapitalize_first(string::in, string::out) is det. % Converts a string to uppercase. % Only letters (A-Z) in the ASCII range are converted. % % This function transforms each code point individually. % Letters that occur within a combining sequence will be converted, % whereas the precomposed character equivalent to the combining % sequence would not be converted. For example: % % to_upper("a\u0301") ==> "A\u0301" % á decomposed % to_upper("\u00E1") ==> "\u00E1" % á precomposed % :- func to_upper(string::in) = (string::uo) is det. :- pred to_upper(string, string). :- mode to_upper(in, uo) is det. :- mode to_upper(in, in) is semidet. % implied % Converts a string to lowercase. % Only letters (a-z) in the ASCII range are converted. % % This function transforms each code point individually. % Letters that occur within a combining sequence will be converted, % whereas the precomposed character equivalent to the combining % sequence would not be converted. For example: % % to_lower("A\u0301") ==> "a\u0301" % à decomposed % to_lower("\u00C1") ==> "\u00C1" % à precomposed % :- func to_lower(string::in) = (string::uo) is det. :- pred to_lower(string, string). :- mode to_lower(in, uo) is det. :- mode to_lower(in, in) is semidet. % implied % pad_left(String0, PadChar, Width, String): % % Insert PadChars at the left of String0 until it is at least as long % as Width, giving String. Width is currently measured as the number % of code points. % :- func pad_left(string, char, int) = string. :- pred pad_left(string::in, char::in, int::in, string::out) is det. % pad_right(String0, PadChar, Width, String): % % Insert PadChars at the right of String0 until it is at least as long % as Width, giving String. Width is currently measured as the number % of code points. % :- func pad_right(string, char, int) = string. :- pred pad_right(string::in, char::in, int::in, string::out) is det. % chomp(String): % % Return String minus any single trailing newline character. % :- func chomp(string) = string. % strip(String): % % Returns String minus any initial and trailing ASCII whitespace % characters, i.e. characters satisfying char.is_whitespace. % :- func strip(string) = string. % lstrip(String): % % Return String minus any initial ASCII whitespace characters, % i.e. characters satisfying char.is_whitespace. % :- func lstrip(string) = string. % rstrip(String): % % Returns String minus any trailing ASCII whitespace characters, % i.e. characters satisfying char.is_whitespace. % :- func rstrip(string) = string. % lstrip_pred(Pred, String): % % Returns String minus the maximal prefix consisting entirely % of code points satisfying Pred. % :- func lstrip_pred(pred(char)::in(pred(in) is semidet), string::in) = (string::out) is det. % rstrip_pred(Pred, String): % % Returns String minus the maximal suffix consisting entirely % of code points satisfying Pred. % :- func rstrip_pred(pred(char)::in(pred(in) is semidet), string::in) = (string::out) is det. % replace(String0, Pattern, Subst, String): % % Replaces the first occurrence of Pattern in String0 with Subst to give % String. Fails if Pattern does not occur in String0. % :- pred replace(string::in, string::in, string::in, string::uo) is semidet. % replace_all(String0, Pattern, Subst, String): % % Replaces any occurrences of Pattern in String0 with Subst to give % String. % % If Pattern is the empty string then Subst is inserted at every point % in String0 except between two code units in an encoding of a code point. % For example, these are true: % % replace_all("", "", "|", "|") % replace_all("a", "", "|", "|a|") % replace_all("ab", "", "|", "|a|b|") % :- func replace_all(string::in, string::in, string::in) = (string::uo) is det. :- pred replace_all(string::in, string::in, string::in, string::uo) is det. % word_wrap(Str, LineLen) = Wrapped: % % Wrapped is Str with newlines inserted between words (separated by ASCII % space characters) so that at most LineLen code points appear on any line, % and each line contains as many whole words as possible subject to that % constraint. If any one word exceeds LineLen code points in length, then % it will be broken over two (or more) lines. Sequences of whitespace % characters are replaced by a single space. % % See char.is_whitespace for the definition of whitespace characters % used by this predicate. % :- func word_wrap(string, int) = string. % word_wrap_separator(Str, LineLen, BrokenWordSeparator) = Wrapped: % % word_wrap_separator/3 is like word_wrap/2, except that words that % need to be broken up over multiple lines have BrokenWordSeparator % inserted between each pair of pieces. If the number of code points in % BrokenWordSeparator is greater than or equal to LineLen, then this % function ignores the separator, since it would leave no room on a line % for any actual words. % :- func word_wrap_separator(string, int, string) = string. %--------------------------------------------------% % % Folds over the characters in strings. % % foldl(Pred, String, !Acc): % % Pred is an accumulator predicate which is to be called for each % code point of the string String in turn. % If String contains ill-formed sequences, Pred is called for each % code unit in an ill-formed sequence. If strings use UTF-8 encoding, % U+FFFD is passed to Pred in place of each such code unit. % If strings use UTF-16 encoding, each code unit in an ill-formed sequence % is an unpaired surrogate code point, which will be passed to Pred. % % The initial value of the accumulator is !.Acc and the final value is % !:Acc. % (foldl(Pred, String, !Acc) is equivalent to % to_char_list(String, Chars), % list.foldl(Pred, Chars, !Acc) % but is implemented more efficiently.) % :- func foldl(func(char, A) = A, string, A) = A. :- pred foldl(pred(char, A, A), string, A, A). :- mode foldl(in(pred(in, di, uo) is det), in, di, uo) is det. :- mode foldl(in(pred(in, in, out) is det), in, in, out) is det. :- mode foldl(in(pred(in, in, out) is semidet), in, in, out) is semidet. :- mode foldl(in(pred(in, in, out) is nondet), in, in, out) is nondet. :- mode foldl(in(pred(in, in, out) is multi), in, in, out) is multi. % foldl2(Pred, String, !Acc1, !Acc2): % A variant of foldl with two accumulators. % :- pred foldl2(pred(char, A, A, B, B), string, A, A, B, B). :- mode foldl2(in(pred(in, di, uo, di, uo) is det), in, di, uo, di, uo) is det. :- mode foldl2(in(pred(in, in, out, di, uo) is det), in, in, out, di, uo) is det. :- mode foldl2(in(pred(in, in, out, in, out) is det), in, in, out, in, out) is det. :- mode foldl2(in(pred(in, in, out, in, out) is semidet), in, in, out, in, out) is semidet. :- mode foldl2(in(pred(in, in, out, in, out) is nondet), in, in, out, in, out) is nondet. :- mode foldl2(in(pred(in, in, out, in, out) is multi), in, in, out, in, out) is multi. % foldl_between(Pred, String, Start, End, !Acc) % is equivalent to foldl(Pred, SubString, !Acc) % where SubString = between(String, Start, End). % % Start and End are in terms of code units. % :- func foldl_between(func(char, A) = A, string, int, int, A) = A. :- pred foldl_between(pred(char, A, A), string, int, int, A, A). :- mode foldl_between(in(pred(in, in, out) is det), in, in, in, in, out) is det. :- mode foldl_between(in(pred(in, di, uo) is det), in, in, in, di, uo) is det. :- mode foldl_between(in(pred(in, in, out) is semidet), in, in, in, in, out) is semidet. :- mode foldl_between(in(pred(in, in, out) is nondet), in, in, in, in, out) is nondet. :- mode foldl_between(in(pred(in, in, out) is multi), in, in, in, in, out) is multi. % foldl2_between(Pred, String, Start, End, !Acc1, !Acc2) % A variant of foldl_between with two accumulators. % % Start and End are in terms of code units. % :- pred foldl2_between(pred(char, A, A, B, B), string, int, int, A, A, B, B). :- mode foldl2_between(in(pred(in, di, uo, di, uo) is det), in, in, in, di, uo, di, uo) is det. :- mode foldl2_between(in(pred(in, in, out, di, uo) is det), in, in, in, in, out, di, uo) is det. :- mode foldl2_between(in(pred(in, in, out, in, out) is det), in, in, in, in, out, in, out) is det. :- mode foldl2_between(in(pred(in, in, out, in, out) is semidet), in, in, in, in, out, in, out) is semidet. :- mode foldl2_between(in(pred(in, in, out, in, out) is nondet), in, in, in, in, out, in, out) is nondet. :- mode foldl2_between(in(pred(in, in, out, in, out) is multi), in, in, in, in, out, in, out) is multi. % foldr(Pred, String, !Acc): % As foldl/4, except that processing proceeds right-to-left. % :- func foldr(func(char, T) = T, string, T) = T. :- pred foldr(pred(char, T, T), string, T, T). :- mode foldr(in(pred(in, in, out) is det), in, in, out) is det. :- mode foldr(in(pred(in, di, uo) is det), in, di, uo) is det. :- mode foldr(in(pred(in, in, out) is semidet), in, in, out) is semidet. :- mode foldr(in(pred(in, in, out) is nondet), in, in, out) is nondet. :- mode foldr(in(pred(in, in, out) is multi), in, in, out) is multi. % foldr_between(Pred, String, Start, End, !Acc) % is equivalent to foldr(Pred, SubString, !Acc) % where SubString = between(String, Start, End). % % Start and End are in terms of code units. % :- func foldr_between(func(char, T) = T, string, int, int, T) = T. :- pred foldr_between(pred(char, T, T), string, int, int, T, T). :- mode foldr_between(in(pred(in, in, out) is det), in, in, in, in, out) is det. :- mode foldr_between(in(pred(in, di, uo) is det), in, in, in, di, uo) is det. :- mode foldr_between(in(pred(in, in, out) is semidet), in, in, in, in, out) is semidet. :- mode foldr_between(in(pred(in, in, out) is nondet), in, in, in, in, out) is nondet. :- mode foldr_between(in(pred(in, in, out) is multi), in, in, in, in, out) is multi. %--------------------------------------------------% % % Formatting tables. % :- type justified_column ---> left(list(string)) ; right(list(string)). % format_table(Columns, Separator) = Table: % % This function takes a list of columns and a column separator, % and returns a formatted table, where % % - the N'th line contains the N'th string in each column; % - that string will be padded to the width of the widest string % in that column; % - each field will be left justfied within that width if the column % has a "left()" wrapper, and right justified if it has a "right()" % wrapper; % - the fields on each line are separated with Separator; % - successive lines are separated by newlines. % % There won't be a newline at the end of Table, to allow callers to decide % whether they want to add one or not. % % This predicate considers the length of a string to be the number of % code points in the string. Note that this is only an approximation: % it will be unaccurate in the presence of e.g. combining characters. % % This predicate requires all the columns to contain the same number % of strings, and throws an exception if this is not the case. % % An example: % % format_table([right(["a", "bb", "ccc"]), left(["1", "22", "333"])], % " * ") % % would return the table: % % a * 1 % bb * 22 % ccc * 333 % :- func format_table(list(justified_column), string) = string. % format_table_max(Columns, Separator) does the same job as format_table, % but allows the caller to associate a maximum width with each column. % If some column had strings of e.g. lengths 18, 20, 35 and 45, then % format_table would format that column as being 45 character wide % in all rows, but if a call to format_table_max specified 30 as the % max width of that column, then format_table_max would format that column % as being 30 character wide in the first two rows, and would widen the % column only when the value does not fit in the maximum, and in each case, % it would widen the column only as much as necessary. In this example, % the column would be 35 and 45 characters wide respectively in the % last two rows. % :- func format_table_max(assoc_list(justified_column, maybe(int)), string) = string. %--------------------------------------------------% % % Converting strings to docs. % % Convert a string to a pretty_printer.doc for formatting. % :- func string_to_doc(string) = pretty_printer.doc. :- pragma obsolete(func(string_to_doc/1), [pretty_printer.string_to_doc/1]). %--------------------------------------------------% % % Converting strings to values of builtin types. % % Convert a string to an int. The string must contain only digits [0-9], % optionally preceded by a plus or minus sign. If the string does % not match this syntax or the number is not in the range % [min_int + 1, max_int], to_int fails. % :- pred to_int(string::in, int::out) is semidet. % Convert a signed base 10 string to an int. Throws an exception if the % string argument does not match the regexp [+-]?[0-9]+ or the number is % not in the range [min_int + 1, max_int]. % :- func det_to_int(string) = int. % Convert a string in the specified base (2-36) to an int. The string % must contain one or more digits in the specified base, optionally % preceded by a plus or minus sign. For bases > 10, digits 10 to 35 % are represented by the letters A-Z or a-z. If the string does not match % this syntax or the number is not in the range [min_int, max_int], % the predicate fails. % :- pred base_string_to_int(int::in, string::in, int::out) is semidet. % Convert a signed base N string to an int. Throws an exception % if the string argument is not precisely an optional sign followed by % a non-empty string of base N digits, or if the number is not in % the range [min_int, max_int]. % :- func det_base_string_to_int(int, string) = int. %--------------------------------------------------% % Convert a string to a uint. The string must contain only digits [0-9]. % If the string does not match this syntax or the number is not % in the range [0, max_uint], to_uint fails. % :- pred to_uint(string::in, uint::out) is semidet. % Convert a signed base 10 string to a uint. Throws an exception if the % string argument does not match the regexp [0-9]+ or the number is % not in the range [0, max_uint]. % :- func det_to_uint(string) = uint. % Convert a string in the specified base (2-36) to a uint. The string % must contain one or more digits in the specified base. For bases > 10, % digits 10 to 35 are represented by the letters A-Z or a-z. If the string % does not match this syntax or the number is not in the range % [0, max_uint], the predicate fails. % :- pred base_string_to_uint(int::in, string::in, uint::out) is semidet. % Convert a signed base N string to a uint. Throws an exception % if the string argument is not precisely a non-empty string of base N % digits, or if the number is not in the range [0, max_uint]. % :- func det_base_string_to_uint(int, string) = uint. %--------------------------------------------------% % Convert a string to a float, returning infinity or -infinity if the % conversion overflows. Fails if the string is not a syntactically correct % float literal. % :- pred to_float(string::in, float::out) is semidet. % Convert a string to a float, returning infinity or -infinity if the % conversion overflows. Throws an exception if the string is not a % syntactically correct float literal. % :- func det_to_float(string) = float. %--------------------------------------------------% % % Converting values of builtin types to strings. % % char_to_string(Char, String): % % Converts a character to a string, or vice versa. % True if String is the well-formed string that encodes the code point % Char; or, if strings are UTF-16 encoded, Char is a surrogate code % point and String is the string that contains only that surrogate code % point. Otherwise, `char_to_string(Char, String)' is false. % % Throws an exception if Char is the null character or a code point that % cannot be encoded in a string (namely, surrogate code points cannot be % encoded in UTF-8 strings). % :- func char_to_string(char::in) = (string::uo) is det. :- pred char_to_string(char, string). :- mode char_to_string(in, uo) is det. :- mode char_to_string(out, in) is semidet. % A synonym for char_to_string/1. % :- func from_char(char::in) = (string::uo) is det. % Convert an integer to a string in base 10. % See int_to_base_string for the string format. % :- func int_to_string(int::in) = (string::uo) is det. :- pred int_to_string(int::in, string::uo) is det. % A synonym for int_to_string/1. % :- func from_int(int::in) = (string::uo) is det. % int_to_base_string(Int, Base, String): % % Convert an integer to a string in a given Base. % String will consist of a minus sign (U+002D HYPHEN-MINUS) % if Int is negative, followed by one or more decimal digits (0-9) % or uppercase letters (A-Z). There will be no leading zeros. % % Base must be between 2 and 36, both inclusive; if it is not, % the predicate will throw an exception. % :- func int_to_base_string(int::in, int::in) = (string::uo) is det. :- pred int_to_base_string(int::in, int::in, string::uo) is det. % Convert an integer to a string in base 10 with commas as thousand % separators. % :- func int_to_string_thousands(int::in) = (string::uo) is det. % int_to_base_string_group(Int, Base, GroupLength, Separator, String): % % Convert an integer to a string in a given Base, % in the same format as int_to_base_string, % with Separator inserted between every GroupLength digits % (grouping from the end of the string). % If GroupLength is less than one, no separators will appear % in the output. Useful for formatting numbers like "1,300,000". % % Base must be between 2 and 36, both inclusive; if it is not, % the predicate will throw an exception. % :- func int_to_base_string_group(int, int, int, string) = string. :- mode int_to_base_string_group(in, in, in, in) = uo is det. % Convert an unsigned integer to a string in base 10. % :- func uint_to_string(uint::in) = (string::uo) is det. % Convert an unsigned integer to a string in base 16. % Alphabetic digits will be lowercase (e.g. a-f). % :- func uint_to_hex_string(uint::in) = (string::uo) is det. :- func uint_to_lc_hex_string(uint::in) = (string::uo) is det. % Convert an unsigned integer to a string in base 16. % Alphabetic digits will be uppercase (e.g. A-F). % :- func uint_to_uc_hex_string(uint::in) = (string::uo) is det. % Convert an unsigned integer to a string in base 8. % :- func uint_to_octal_string(uint::in) = (string::uo) is det. % Convert a signed/unsigned 8/16/32/64 bit integer to a string. % :- func int8_to_string(int8::in) = (string::uo) is det. :- func uint8_to_string(uint8::in) = (string::uo) is det. :- func int16_to_string(int16::in) = (string::uo) is det. :- func uint16_to_string(uint16::in) = (string::uo) is det. :- func int32_to_string(int32::in) = (string::uo) is det. :- func uint32_to_string(uint32::in) = (string::uo) is det. :- func int64_to_string(int64::in) = (string::uo) is det. :- func uint64_to_string(uint64::in) = (string::uo) is det. % Convert an unsigned 64-bit integer to a string in base 16. % Alphabetic digits will be lowercase (e.g. a-f). % :- func uint64_to_hex_string(uint64::in) = (string::uo) is det. :- func uint64_to_lc_hex_string(uint64::in) = (string::uo) is det. % Convert an unsigned 64-bit integer to a string in base 16. % Alphabetic digits will be uppercase (e.g. A-F). % :- func uint64_to_uc_hex_string(uint64::in) = (string::uo) is det. % Convert an unsigned 64-bit integer to a string in base 8. % :- func uint64_to_octal_string(uint64::in) = (string::uo) is det. % Convert a float to a string. % In the current implementation, the resulting float will be in the form % that it was printed using the format string "%#.<prec>g". % <prec> will be in the range p to (p+2) % where p = floor(mantissa_digits * log2(base_radix) / log2(10)). % The precision chosen from this range will be such as to allow % a successful decimal -> binary conversion of the float. % :- func float_to_string(float::in) = (string::uo) is det. :- pred float_to_string(float::in, string::uo) is det. % A synonym for float_to_string/1. % :- func from_float(float::in) = (string::uo) is det. % Convert a c_pointer to a string. The format is "c_pointer(0xXXXX)" % where XXXX is the hexadecimal representation of the pointer. % :- func c_pointer_to_string(c_pointer::in) = (string::uo) is det. :- pred c_pointer_to_string(c_pointer::in, string::uo) is det. % A synonym for c_pointer_to_string/1. % :- func from_c_pointer(c_pointer::in) = (string::uo) is det. %--------------------------------------------------% % % Converting values of arbitrary types to strings. % % string(X): Returns a canonicalized string representation of the value X % using the standard Mercury operators. % :- func string(T) = string. % As above, but using the supplied table of operators. % :- func string_ops(ops.table, T) = string. % string_ops_noncanon(NonCanon, OpTable, X, String) % % As above, but the caller specifies what behaviour should occur for % non-canonical terms (i.e. terms where multiple representations % may compare as equal): % % - `do_not_allow' will throw an exception if (any subterm of) % the argument is not canonical; % - `canonicalize' will substitute a string indicating the presence % of a non-canonical subterm; % - `include_details_cc' will show the structure of any non-canonical % subterms, but can only be called from a committed choice context. % :- pred string_ops_noncanon(noncanon_handling, ops.table, T, string). :- mode string_ops_noncanon(in(do_not_allow), in, in, out) is det. :- mode string_ops_noncanon(in(canonicalize), in, in, out) is det. :- mode string_ops_noncanon(in(include_details_cc), in, in, out) is cc_multi. :- mode string_ops_noncanon(in, in, in, out) is cc_multi. %--------------------------------------------------% % % Converting values to strings based on a format string. % :- type poly_type ---> f(float) ; i(int) ; i8(int8) ; i16(int16) ; i32(int32) ; i64(int64) ; u(uint) ; u8(uint8) ; u16(uint16) ; u32(uint32) ; u64(uint64) ; s(string) ; c(char). % A function similar to sprintf() in C. % % For example, % format("%s %i %c %f\n", % [s("Square-root of"), i(2), c('='), f(1.41)], String) % will return % String = "Square-root of 2 = 1.41\n". % % The following options available in C are supported: flags [0+-# ], % a field width (or *), and a precision (could be a ".*"). % % Valid conversion character types are {dioxXucsfeEgGp%}. %n is not % supported. format will not return the length of the string. % % conv var output form. effect of '#'. % char. type(s). % % d int signed integer % i int signed integer % o int, uint unsigned octal with '0' prefix % x,X int, uint unsigned hex with '0x', '0X' prefix % u int, uint unsigned integer % c char character % s string string % f float rational number with '.', if precision 0 % e,E float [-]m.dddddE+-xx with '.', if precision 0 % g,G float either e or f with trailing zeros. % p int, uint integer % % The valid conversion characters for int8, int16, int32 and int64 % are the same as for int, and the valid conversion characters for % uint8, uint16, uint32 and uint64 are the same as for uint. % % An option of zero will cause any padding to be zeros rather than spaces. % A '-' will cause the output to be left-justified in its 'space'. % (Without a `-', the default is for fields to be right-justified.) % A '+' forces a sign to be printed. This is not sensible for string % and character output. A ' ' causes a space to be printed before a thing % if there is no sign there. The other option is the '#', which modifies % the output string's format. These options are normally put directly % after the '%'. % % Notes: % % %#.0e, %#.0E now prints a '.' before the 'e'. % % Asking for more precision than a float actually has will result in % potentially misleading output. % % Numbers are now rounded by precision value, not truncated as previously. % % The implementation uses the sprintf() function in C grades, % so the actual output will depend on the C standard library. % :- func format(string, list(poly_type)) = string. :- pred format(string::in, list(poly_type)::in, string::out) is det. %--------------------------------------------------% %--------------------------------------------------%
Next: table_statistics, Previous: string.builder, Up: Top [Contents]