Next: float, Previous: fat_sparse_bitset, Up: Top [Contents]
%--------------------------------------------------%
% vim: ts=4 sw=4 et ft=mercury
%--------------------------------------------------%
% Copyright (C) 2011-2012 The University of Melbourne.
% Copyright (C) 2014, 2016-2022, 2024-2025 The Mercury team.
% This file is distributed under the terms specified in COPYING.LIB.
%--------------------------------------------------%
%
% File: fatter_sparse_bitset.m.
% Author: zs.
% Stability: high.
%
% This module provides an abstract data type for storing sets of items
% that can each be represented by unsigned integers.
% If the integers being stored are closely grouped, a sparse_bitset
% will be much more compact than either the list-of-elements representations
% provided by set.m, set_ordlist.m, and set_unordlist.m, or the
% tree-of-elements representations provided by set_bbbtree.m, set_tree234.
% and set_ctree234.m.
%
% A sparse bitset is represented as a sorted list, with each element
% of this list containing two unsigned integers: Offset and Bits.
% Offset will always be a multiple of uint.ubits_per_uint, and
% the bits of Bits describe which of the elements of the range
% Offset .. (Offset + ubits_per_uint - 1) are in the set.
% The value of Bits must not be zero; any operation that would clear
% all the bits in Bits must also delete the whole list element.
% As one goes from the head towards the tail of the list, the offsets of
% the list elements must strictly increase.
%
% The values of Offset in the list need not be *contiguous* multiples
% of ubits_per_uint, hence the name *sparse* bitset.
%
% A sparse_bitset is suitable for storing sets of integers which
% can be represented using only a few Offset/Bits pairs.
% In the worst case, where the integers stored are not closely grouped,
% a sparse_bitset will take more memory than an ordinary set, but
% the operations should not be too much slower.
%
% In the asymptotic complexities of the operations below,
% `rep_size(Set)' is the number of Offset/Bits pairs needed to represent Set,
% and `card(Set)' is the cardinality of Set (i.e. its number of elements).
%
%--------------------------------------------------%
%
% The sparse_bitset, fat_sparse_bitset and fatter_sparse_bitset modules
% all use minor variations of the same data structure. To show the differences
% between them, we will use an example bitset containing four offset/bits
% pairs, with the offsets being 0, 64, 256 and 512; the values of the bits
% fields do not matter. We will assume that the word size is 64 bits.
%
% - The sparse_bitset module will store this set using eight memory cells,
% all of them containing two words. Four cells will contain the offset/bits
% pairs, and four will be the cons cells linking them together.
%
% - The fat_sparse_bitset module will store this set using four memory cells,
% all of them containing three words: an offset, the corresponding bits,
% and the pointer to the next cell.
%
% - The fatter_sparse_bitset module will store this set using three memory
% cells, all of them containing four words: an offset, *two* words of
% corresponding bits, and the pointer to the next cell.
%
% In each of the first two representations, the cells' bits fields
% will contain information about the presence/absence in the set of items
% 0-63, 64-127, 256-319, and 512-575 respectively.
% In the third representation, the cells' bits fields will contain information
% about the presence/absence in the set of items 0-127, 256-383, and 512-639
% respectively.
%
% The fat and fatter sparse_bitset representations have the advantage over
% the base sparse_bitset representation that processing the list requires
% following fewer pointers, since fetching one cell gets you the offset,
% the bits, and the address of the next cell all at once. This can be expected
% to result in fewer cache misses. However, the cons cell and the corresponding
% offset/bits cell in the sparse_bitset representation are very likely to be
% allocated together, which means that a good memory allocator may return
% two cells near each other for them.
%
% The fat_sparse_bitset implementation effectively replaces two two-word cells
% with a single three-word cell. This reduces initialization cost from writing
% four words to memory to writing three, but this is usually immaterial,
% because on todays processors, which are usually superscalar and have write
% buffers, those writes are effectively free. There is also no real advantage
% in memory usage, because most memory allocators will allocate a four-word
% block for a three-word request. This includes the Boehm-Demers-Weiser system
% usually used with Mercury.
%
% The fatter_sparse_bitset representation is intended to avoid this last
% problem with the fat_sparse_bitset representation, by repurposing the
% unused word allocated in each cell to represent up to another word's worth
% of items. In cases where this word contains at least one set bit, this
% is a worthwhile gain; in cases where this word is usually all zeroes,
% there will be no gain, but the (thanks to write buffers) the initialization
% cost will typically be negligible as well. There is the additional cost
% that each operation on the bits associated with an offset will either
% have to select which word of bits it applies to, or has to apply to both
% words of bits, but unless typical sets are so sparse that one word of bits
% in each cell is almost always empty, the reduction in the number of cells
% that have to be visited will more than compensate for those costs.
%
%--------------------------------------------------%
:- module fatter_sparse_bitset.
:- interface.
:- import_module enum.
:- import_module list.
:- import_module term.
:- use_module set.
%--------------------------------------------------%
:- type fatter_sparse_bitset(T). % <= uenum(T).
%--------------------------------------------------%
%
% Initial creation of sets.
%
% Return an empty set.
%
:- func init = fatter_sparse_bitset(T).
:- pred init(fatter_sparse_bitset(T)::out) is det.
% Note: set.m contains the reverse mode of this predicate, but it is
% difficult to implement both modes using the representation in this
% module.
%
:- pred singleton_set(fatter_sparse_bitset(T)::out, T::in) is det <= uenum(T).
% make_singleton_set(Item) returns a set containing just the single Item.
%
:- func make_singleton_set(T) = fatter_sparse_bitset(T) <= uenum(T).
%--------------------------------------------------%
%
% Emptiness and singleton-ness tests.
%
:- pred is_empty(fatter_sparse_bitset(T)::in) is semidet.
:- pred is_non_empty(fatter_sparse_bitset(T)::in) is semidet.
% Is the given set a singleton, and if yes, what is the element?
%
:- pred is_singleton(fatter_sparse_bitset(T)::in, T::out) is semidet
<= uenum(T).
%--------------------------------------------------%
%
% Membership tests.
%
% member(Item, Set) is true if-and-only-if Item is a member of Set.
% Takes O(rep_size(Set)) time.
%
:- pred member(T, fatter_sparse_bitset(T)) <= uenum(T).
:- mode member(in, in) is semidet.
:- mode member(out, in) is nondet.
% contains(Set, Item) is true if-and-only-if Item is a member of Set.
% Takes O(rep_size(Set)) time.
%
:- pred contains(fatter_sparse_bitset(T)::in, T::in) is semidet <= uenum(T).
:- pred nondet_member(fatter_sparse_bitset(T)::in, T::out) is nondet
<= uenum(T).
%--------------------------------------------------%
%
% Insertions and deletions.
%
% insert(Set, Item) returns the union of Set and the set containing
% only Item. Takes O(rep_size(Set)) time and space.
%
:- func insert(fatter_sparse_bitset(T), T) = fatter_sparse_bitset(T)
<= uenum(T).
:- pred insert(T::in,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
% insert_new(Item, Set0, Set) returns the union of Set0 and the set
% containing only Item if Set0 does not already contain Item; if it does,
% it fails. Takes O(rep_size(Set)) time and space.
%
:- pred insert_new(T::in,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is semidet
<= uenum(T).
% insert_list(Set, Item) returns the union of Set and the set containing
% only the members of Item. Same as `union(Set, list_to_set(Item))',
% but may be more efficient.
%
:- func insert_list(fatter_sparse_bitset(T), list(T)) = fatter_sparse_bitset(T)
<= uenum(T).
:- pred insert_list(list(T)::in,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
%--------------------------------------------------%
% delete(Set, Item) returns the difference between Set and the set
% containing only Item. Takes O(rep_size(Set)) time and space.
%
:- func delete(fatter_sparse_bitset(T), T) = fatter_sparse_bitset(T)
<= uenum(T).
:- pred delete(T::in,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
% delete_list(Set, Items) returns the difference between Set and the set
% containing only the members of Items. Same as
% `difference(Set, list_to_set(Items))', but may be more efficient.
%
:- func delete_list(fatter_sparse_bitset(T), list(T)) = fatter_sparse_bitset(T)
<= uenum(T).
:- pred delete_list(list(T)::in,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
% remove(Item, Set0, Set) returns in Set the difference of Set0
% and the set containing only Item, failing if Set0 does not contain Item.
% Takes O(rep_size(Set)) time and space.
%
:- pred remove(T::in,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is semidet
<= uenum(T).
% remove_list(Items, Set0, Set) returns in Set the difference of Set0
% and the set containing all the elements of Items, failing if any element
% of Item is not in Set0. Same as `subset(list_to_set(Items), Set0),
% difference(Set0, list_to_set(Items), Set)', but may be more efficient.
%
:- pred remove_list(list(T)::in,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is semidet
<= uenum(T).
% remove_leq(Set, Item) returns returns the set containing all the
% elements of Set whose enum forms are strictly greater than
% the enum form of Item.
%
:- func remove_leq(fatter_sparse_bitset(T), T) = fatter_sparse_bitset(T)
<= uenum(T).
:- pred remove_leq(T::in,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
% remove_gt(Set, Item) returns returns the set containing all the elements
% of Set whose enum forms are less than or equal to the enum form of Item.
%
:- func remove_gt(fatter_sparse_bitset(T), T) = fatter_sparse_bitset(T)
<= uenum(T).
:- pred remove_gt(T::in,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
% remove_least(Set0, Item, Set) is true if-and-only-if Item is the element
% whose enum form is the smallest in Set0, and Set is the set
% which contains all the elements of Set0 except Item. Takes O(1) time
% and space.
%
:- pred remove_least(T::out,
fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is semidet
<= uenum(T).
%--------------------------------------------------%
%
% Comparisons between sets.
%
% equal(SetA, SetB) is true if-and-only-if SetA and SetB contain
% the same elements. Takes O(min(rep_size(SetA), rep_size(SetB))) time.
%
:- pred equal(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in)
is semidet.
% subset(Subset, Set) is true if-and-only-if Subset is a subset of Set.
% Same as `intersect(Set, Subset, Subset)', but may be more efficient.
%
:- pred subset(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in)
is semidet.
% superset(Superset, Set) is true if-and-only-if
% Superset is a superset of Set.
% Same as `intersect(Superset, Set, Set)', but may be more efficient.
%
:- pred superset(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in)
is semidet.
%--------------------------------------------------%
%
% Operations on two or more sets.
%
% union(SetA, SetB) returns the union of SetA and SetB. The
% efficiency of the union operation is not sensitive to the argument
% ordering. Takes O(rep_size(SetA) + rep_size(SetB)) time and space.
%
:- func union(fatter_sparse_bitset(T), fatter_sparse_bitset(T))
= fatter_sparse_bitset(T).
:- pred union(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in,
fatter_sparse_bitset(T)::out) is det.
% union_list(Sets, Set) returns the union of all the sets in Sets.
%
:- func union_list(list(fatter_sparse_bitset(T))) = fatter_sparse_bitset(T).
:- pred union_list(list(fatter_sparse_bitset(T))::in,
fatter_sparse_bitset(T)::out) is det.
% intersect(SetA, SetB) returns the intersection of SetA and SetB.
% The efficiency of the intersection operation is not sensitive to the
% argument ordering. Takes O(rep_size(SetA) + rep_size(SetB)) time and
% O(min(rep_size(SetA)), rep_size(SetB)) space.
%
:- func intersect(fatter_sparse_bitset(T), fatter_sparse_bitset(T))
= fatter_sparse_bitset(T).
:- pred intersect(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in,
fatter_sparse_bitset(T)::out) is det.
% intersect_list(Sets, Set) returns the intersection of all the sets
% in Sets.
%
:- func intersect_list(list(fatter_sparse_bitset(T))) =
fatter_sparse_bitset(T).
:- pred intersect_list(list(fatter_sparse_bitset(T))::in,
fatter_sparse_bitset(T)::out) is det.
% difference(SetA, SetB) returns the set containing all the elements
% of SetA except those that occur in SetB. Takes
% O(rep_size(SetA) + rep_size(SetB)) time and O(rep_size(SetA)) space.
%
:- func difference(fatter_sparse_bitset(T), fatter_sparse_bitset(T))
= fatter_sparse_bitset(T).
:- pred difference(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in,
fatter_sparse_bitset(T)::out) is det.
%--------------------------------------------------%
%
% Operations that divide a set into two parts.
%
% divide(Pred, Set, InPart, OutPart):
%
% InPart will consist of those elements of Set for which Pred succeeds;
% OutPart will consist of those elements of Set for which Pred fails.
%
:- pred divide(pred(T)::in(pred(in) is semidet), fatter_sparse_bitset(T)::in,
fatter_sparse_bitset(T)::out, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
% divide_by_set(DivideBySet, Set, InPart, OutPart):
%
% InPart will consist of those elements of Set which are also in
% DivideBySet; OutPart will consist of those elements of Set
% which are not in DivideBySet.
%
:- pred divide_by_set(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in,
fatter_sparse_bitset(T)::out, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
%--------------------------------------------------%
%
% Converting lists to sets.
%
% list_to_set(List) returns a set containing only the members of List.
% In the worst case this will take O(length(List)^2) time and space.
% If the elements of the list are closely grouped, the complexity
% will be closer to O(length(List)).
%
:- func list_to_set(list(T)) = fatter_sparse_bitset(T) <= uenum(T).
:- pred list_to_set(list(T)::in, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
% A synonym for list_to_set/1.
%
:- func from_list(list(T)) = fatter_sparse_bitset(T) <= uenum(T).
% sorted_list_to_set(List) returns a set containing only the members
% of List. List must be sorted. Takes O(length(List)) time and space.
%
:- func sorted_list_to_set(list(T)) = fatter_sparse_bitset(T) <= uenum(T).
:- pred sorted_list_to_set(list(T)::in, fatter_sparse_bitset(T)::out)
is det <= uenum(T).
% A synonym for sorted_list_to_set/1.
%
:- func from_sorted_list(list(T)) = fatter_sparse_bitset(T) <= uenum(T).
%--------------------------------------------------%
%
% Converting sets to lists.
%
% to_sorted_list(Set) returns a list containing all the members of Set,
% in sorted order. Takes O(card(Set)) time and space.
%
:- func to_sorted_list(fatter_sparse_bitset(T)) = list(T) <= uenum(T).
:- pred to_sorted_list(fatter_sparse_bitset(T)::in, list(T)::out) is det
<= uenum(T).
%--------------------------------------------------%
%
% Converting between different kinds of sets.
%
% from_set(Set) returns a bitset containing only the members of Set.
% Takes O(card(Set)) time and space.
%
:- func from_set(set.set(T)) = fatter_sparse_bitset(T) <= uenum(T).
% to_set(Set) returns a set.set containing all the members of Set,
% Takes O(card(Set)) time and space.
%
:- func to_set(fatter_sparse_bitset(T)) = set.set(T) <= uenum(T).
%--------------------------------------------------%
%
% Counting.
%
% count(Set) returns the number of elements in Set.
% Takes O(card(Set)) time.
%
:- func count(fatter_sparse_bitset(T)) = int <= uenum(T).
%--------------------------------------------------%
%
% Standard higher order functions on collections.
%
% all_true(Pred, Set) succeeds if-and-only-if Pred(Element) succeeds
% for all the elements of Set.
%
:- pred all_true(pred(T)::in(pred(in) is semidet), fatter_sparse_bitset(T)::in)
is semidet <= uenum(T).
% filter(Pred, Set) returns the elements of Set for which Pred succeeds.
%
:- func filter(pred(T)::in(pred(in) is semidet), fatter_sparse_bitset(T)::in)
= (fatter_sparse_bitset(T)::out) is det <= uenum(T).
% filter(Pred, Set, TrueSet, FalseSet) returns the elements of Set
% for which Pred succeeds, and those for which it fails.
%
:- pred filter(pred(T)::in(pred(in) is semidet),
fatter_sparse_bitset(T)::in,
fatter_sparse_bitset(T)::out, fatter_sparse_bitset(T)::out) is det
<= uenum(T).
% foldl(Func, Set, Start) calls Func with each element of Set
% (in sorted order) and an accumulator (with the initial value of Start),
% and returns the final value. Takes O(card(Set)) time.
%
:- func foldl(func(T, A) = A, fatter_sparse_bitset(T), A) = A <= uenum(T).
:- pred foldl(pred(T, A, A), fatter_sparse_bitset(T), A, A) <= uenum(T).
:- mode foldl(in(pred(in, in, out) is det), in, in, out) is det.
:- mode foldl(in(pred(in, mdi, muo) is det), in, mdi, muo) is det.
:- mode foldl(in(pred(in, di, uo) is det), in, di, uo) is det.
:- mode foldl(in(pred(in, in, out) is semidet), in, in, out) is semidet.
:- mode foldl(in(pred(in, mdi, muo) is semidet), in, mdi, muo) is semidet.
:- mode foldl(in(pred(in, di, uo) is semidet), in, di, uo) is semidet.
:- mode foldl(in(pred(in, in, out) is nondet), in, in, out) is nondet.
:- mode foldl(in(pred(in, in, out) is cc_multi), in, in, out) is cc_multi.
:- mode foldl(in(pred(in, di, uo) is cc_multi), in, di, uo) is cc_multi.
:- pred foldl2(pred(T, A, A, B, B), fatter_sparse_bitset(T), A, A, B, B)
<= uenum(T).
:- mode foldl2(in(pred(in, in, out, in, out) is det),
in, in, out, in, out) is det.
:- mode foldl2(in(pred(in, in, out, mdi, muo) is det),
in, in, out, mdi, muo) is det.
:- mode foldl2(in(pred(in, in, out, di, uo) is det),
in, in, out, di, uo) is det.
:- mode foldl2(in(pred(in, di, uo, di, uo) is det),
in, di, uo, di, uo) is det.
:- mode foldl2(in(pred(in, in, out, in, out) is semidet),
in, in, out, in, out) is semidet.
:- mode foldl2(in(pred(in, in, out, mdi, muo) is semidet),
in, in, out, mdi, muo) is semidet.
:- mode foldl2(in(pred(in, in, out, di, uo) is semidet),
in, in, out, di, uo) is semidet.
:- mode foldl2(in(pred(in, in, out, in, out) is nondet),
in, in, out, in, out) is nondet.
:- mode foldl2(in(pred(in, in, out, in, out) is cc_multi),
in, in, out, in, out) is cc_multi.
:- mode foldl2(in(pred(in, in, out, di, uo) is cc_multi),
in, in, out, di, uo) is cc_multi.
:- mode foldl2(in(pred(in, di, uo, di, uo) is cc_multi),
in, di, uo, di, uo) is cc_multi.
% foldr(Func, Set, Start) calls Func with each element of Set
% (in reverse sorted order) and an accumulator (with the initial value
% of Start), and returns the final value. Takes O(card(Set)) time.
%
:- func foldr(func(T, A) = A, fatter_sparse_bitset(T), A) = A <= uenum(T).
:- pred foldr(pred(T, A, A), fatter_sparse_bitset(T), A, A) <= uenum(T).
:- mode foldr(in(pred(in, in, out) is det), in, in, out) is det.
:- mode foldr(in(pred(in, mdi, muo) is det), in, mdi, muo) is det.
:- mode foldr(in(pred(in, di, uo) is det), in, di, uo) is det.
:- mode foldr(in(pred(in, in, out) is semidet), in, in, out) is semidet.
:- mode foldr(in(pred(in, mdi, muo) is semidet), in, mdi, muo) is semidet.
:- mode foldr(in(pred(in, di, uo) is semidet), in, di, uo) is semidet.
:- mode foldr(in(pred(in, in, out) is nondet), in, in, out) is nondet.
:- mode foldr(in(pred(in, in, out) is cc_multi), in, in, out) is cc_multi.
:- mode foldr(in(pred(in, di, uo) is cc_multi), in, di, uo) is cc_multi.
:- pred foldr2(pred(T, A, A, B, B), fatter_sparse_bitset(T), A, A, B, B)
<= uenum(T).
:- mode foldr2(in(pred(in, in, out, in, out) is det),
in, in, out, in, out) is det.
:- mode foldr2(in(pred(in, in, out, mdi, muo) is det),
in, in, out, mdi, muo) is det.
:- mode foldr2(in(pred(in, in, out, di, uo) is det),
in, in, out, di, uo) is det.
:- mode foldr2(in(pred(in, di, uo, di, uo) is det),
in, di, uo, di, uo) is det.
:- mode foldr2(in(pred(in, in, out, in, out) is semidet),
in, in, out, in, out) is semidet.
:- mode foldr2(in(pred(in, in, out, mdi, muo) is semidet),
in, in, out, mdi, muo) is semidet.
:- mode foldr2(in(pred(in, in, out, di, uo) is semidet),
in, in, out, di, uo) is semidet.
:- mode foldr2(in(pred(in, in, out, in, out) is nondet),
in, in, out, in, out) is nondet.
:- mode foldr2(in(pred(in, di, uo, di, uo) is cc_multi),
in, di, uo, di, uo) is cc_multi.
:- mode foldr2(in(pred(in, in, out, di, uo) is cc_multi),
in, in, out, di, uo) is cc_multi.
:- mode foldr2(in(pred(in, in, out, in, out) is cc_multi),
in, in, out, in, out) is cc_multi.
%--------------------------------------------------%
%--------------------------------------------------%
Next: float, Previous: fat_sparse_bitset, Up: Top [Contents]