Next: float, Previous: fat_sparse_bitset, Up: Top [Contents]
%--------------------------------------------------% % vim: ts=4 sw=4 et ft=mercury %--------------------------------------------------% % Copyright (C) 2011-2012 The University of Melbourne. % Copyright (C) 2014, 2016-2022 The Mercury team. % This file is distributed under the terms specified in COPYING.LIB. %--------------------------------------------------% % % File: fatter_sparse_bitset.m. % Author: zs. % Stability: medium. % % This module provides an abstract data type for storing sets of items % that can each be represented by non-negative integers. % If the integers being stored are closely grouped, a sparse_bitset % will be much more compact than either the list-of-elements representations % provided by set.m, set_ordlist.m, and set_unordlist.m, or the % tree-of-elements representations provided by set_bbbtree.m, set_tree234. % or set_ctree234.m. % % A sparse bitset is represented as a sorted list, with each element % of this list containing two unsigned integers: Offset and Bits. % Offset will always be a multiple of uint.ubits_per_uint, and % the bits of Bits describe which of the elements of the range % Offset .. (Offset + ubits_per_uint - 1) are in the set. % The value of Bits must not be zero; any operation that would clear % all the bits in Bits must also delete the whole list element. % As one goes from the head towards the tail of the list, the offsets of % the list elements must strictly increase. % % The values of Offset in the list need not be *contiguous* multiples % of ubits_per_uint, hence the name *sparse* bitset. % % A sparse_bitset is suitable for storing sets of integers which % can be represented using only a few Offset/Bits pairs. % In the worst case, where the integers stored are not closely grouped, % a sparse_bitset will take more memory than an ordinary set, but % the operations should not be too much slower. % % In the asymptotic complexities of the operations below, % `rep_size(Set)' is the number of Offset/Bits pairs needed to represent Set, % and `card(Set)' is the cardinality of Set (i.e. its number of elements). % %--------------------------------------------------% % % The sparse_bitset, fat_sparse_bitset and fatter_sparse_bitset modules % all use minor variations of the same data structure. To show the differences % between them, we will use an example bitset containing four offset/bits % pairs, with the offsets being 0, 64, 256 and 512; the values of the bits % fields do not matter. We will assume that the word size is 64 bits. % % - The sparse_bitset module will store this set using eight memory cells, % all of them containing two words. Four cells will contain the offset/bits % pairs, and four will be the cons cells linking them together. % % - The fat_sparse_bitset module will store this set using four memory cells, % all of them containing three words: an offset, the corresponding bits, % and the pointer to the next cell. % % - The fatter_sparse_bitset module will store this set using three memory % cells, all of them containing four words: an offset, *two* words of % corresponding bits, and the pointer to the next cell. % % In each of the first two representations, the cells' bits fields % will contain information about the presence/absence in the set of items % 0-63, 64-127, 256-319, and 512-575 respectively. % In the third representation, the cells' bits fields will contain information % about the presence/absence in the set of items 0-127, 256-383, and 512-639 % respectively. % % The fat and fatter sparse_bitset representations have the advantage over % the base sparse_bitset representation that processing the list requires % following fewer pointers, since fetching one cell gets you the offset, % the bits, and the address of the next cell all at once. This can be expected % to result in fewer cache misses. However, the cons cell and the corresponding % offset/bits cell in the sparse_bitset representation are very likely to be % allocated together, which means that a good memory allocator may return % two cells near each other for them. % % The fat_sparse_bitset implementation effectively replaces two two-word cells % with a single three-word cell. This reduces initialization cost from writing % four words to memory to writing three, but this is usually immaterial, % because on todays processors, which are usually superscalar and have write % buffers, those writes are effectively free. There is also no real advantage % in memory usage, because most memory allocators will allocate a four-word % block for a three-word request. This includes the Boehm-Demers-Weiser system % usually used with Mercury. % % The fatter_sparse_bitset representation is intended to avoid this last % problem with the fat_sparse_bitset representation, by repurposing the % unused word allocated in each cell to represent up to another word's worth % of items. In cases where this word contains at least one set bit, this % is a worthwhile gain; in cases where this word is usually all zeroes, % there will be no gain, but the (thanks to write buffers) the initialization % cost will typically be negligible as well. There is the additional cost % that each operation on the bits associated with an offset will either % have to select which word of bits it applies to, or has to apply to both % words of bits, but unless typical sets are so sparse that one word of bits % in each cell is almost always empty, the reduction in the number of cells % that have to be visited will more than compensate for those costs. % %--------------------------------------------------% :- module fatter_sparse_bitset. :- interface. :- import_module enum. :- import_module list. :- import_module term. :- use_module set. %--------------------------------------------------% :- type fatter_sparse_bitset(T). % <= uenum(T). %--------------------------------------------------% % % Initial creation of sets. % % Return an empty set. % :- func init = fatter_sparse_bitset(T). :- pred init(fatter_sparse_bitset(T)::out) is det. % Note: set.m contains the reverse mode of this predicate, but it is % difficult to implement both modes using the representation in this % module. % :- pred singleton_set(fatter_sparse_bitset(T)::out, T::in) is det <= uenum(T). % make_singleton_set(Item) returns a set containing just the single Item. % :- func make_singleton_set(T) = fatter_sparse_bitset(T) <= uenum(T). %--------------------------------------------------% % % Emptiness and singleton-ness tests. % :- pred is_empty(fatter_sparse_bitset(T)::in) is semidet. :- pred is_non_empty(fatter_sparse_bitset(T)::in) is semidet. % Is the given set a singleton, and if yes, what is the element? % :- pred is_singleton(fatter_sparse_bitset(T)::in, T::out) is semidet <= uenum(T). %--------------------------------------------------% % % Membership tests. % % member(Item, Set) is true iff Item is a member of Set. % Takes O(rep_size(Set)) time. % :- pred member(T, fatter_sparse_bitset(T)) <= uenum(T). :- mode member(in, in) is semidet. :- mode member(out, in) is nondet. % contains(Set, Item) is true iff Item is a member of Set. % Takes O(rep_size(Set)) time. % :- pred contains(fatter_sparse_bitset(T)::in, T::in) is semidet <= uenum(T). %--------------------------------------------------% % % Insertions and deletions. % % insert(Set, Item) returns the union of Set and the set containing % only Item. Takes O(rep_size(Set)) time and space. % :- func insert(fatter_sparse_bitset(T), T) = fatter_sparse_bitset(T) <= uenum(T). :- pred insert(T::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det <= uenum(T). % insert_new(Item, Set0, Set) returns the union of Set0 and the set % containing only Item if Set0 does not already contain Item; if it does, % it fails. Takes O(rep_size(Set)) time and space. % :- pred insert_new(T::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is semidet <= uenum(T). % insert_list(Set, Item) returns the union of Set and the set containing % only the members of Item. Same as `union(Set, list_to_set(Item))', % but may be more efficient. % :- func insert_list(fatter_sparse_bitset(T), list(T)) = fatter_sparse_bitset(T) <= uenum(T). :- pred insert_list(list(T)::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det <= uenum(T). %--------------------------------------------------% % delete(Set, Item) returns the difference of Set and the set containing % only Item. Takes O(rep_size(Set)) time and space. % :- func delete(fatter_sparse_bitset(T), T) = fatter_sparse_bitset(T) <= uenum(T). :- pred delete(T::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det <= uenum(T). % delete_list(Set, Item) returns the difference of Set and the set % containing only the members of Item. Same as % `difference(Set, list_to_set(Item))', but may be more efficient. % :- func delete_list(fatter_sparse_bitset(T), list(T)) = fatter_sparse_bitset(T) <= uenum(T). :- pred delete_list(list(T)::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det <= uenum(T). % remove(Item, Set0, Set) returns in Set the difference of Set0 % and the set containing only Item, failing if Set0 does not contain Item. % Takes O(rep_size(Set)) time and space. % :- pred remove(T::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is semidet <= uenum(T). % remove_list(Item, Set0, Set) returns in Set the difference of Set0 % and the set containing all the elements of Item, failing if any element % of Item is not in Set0. Same as `subset(list_to_set(Item), Set0), % difference(Set0, list_to_set(Item), Set)', but may be more efficient. % :- pred remove_list(list(T)::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is semidet <= uenum(T). % remove_leq(Set, Item) returns Set with all elements less than or equal % to Item removed. In other words, it returns the set containing all the % elements of Set whose enum forms are greater than the enum form of Item. % :- func remove_leq(fatter_sparse_bitset(T), T) = fatter_sparse_bitset(T) <= uenum(T). :- pred remove_leq(T::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det <= uenum(T). % remove_gt(Set, Item) returns Set with all elements greater than Item % removed. In other words, it returns the set containing all the elements % of Set whose enum forms are less than or equal to the enum form of Item. % :- func remove_gt(fatter_sparse_bitset(T), T) = fatter_sparse_bitset(T) <= uenum(T). :- pred remove_gt(T::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det <= uenum(T). % remove_least(Set0, Item, Set) is true iff Item is the element % whose enum form is the smallest in Set0, and Set is the set % which contains all the elements of Set0 except Item. Takes O(1) time % and space. % :- pred remove_least(T::out, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is semidet <= uenum(T). %--------------------------------------------------% % % Comparisons between sets. % % equal(SetA, SetB) is true iff SetA and SetB contain the same elements. % Takes O(min(rep_size(SetA), rep_size(SetB))) time. % :- pred equal(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in) is semidet. % subset(Subset, Set) is true iff Subset is a subset of Set. % Same as `intersect(Set, Subset, Subset)', but may be more efficient. % :- pred subset(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in) is semidet. % superset(Superset, Set) is true iff Superset is a superset of Set. % Same as `intersect(Superset, Set, Set)', but may be more efficient. % :- pred superset(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in) is semidet. %--------------------------------------------------% % % Operations on two or more sets. % % union(SetA, SetB) returns the union of SetA and SetB. The % efficiency of the union operation is not sensitive to the argument % ordering. Takes O(rep_size(SetA) + rep_size(SetB)) time and space. % :- func union(fatter_sparse_bitset(T), fatter_sparse_bitset(T)) = fatter_sparse_bitset(T). :- pred union(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det. % union_list(Sets, Set) returns the union of all the sets in Sets. % :- func union_list(list(fatter_sparse_bitset(T))) = fatter_sparse_bitset(T). :- pred union_list(list(fatter_sparse_bitset(T))::in, fatter_sparse_bitset(T)::out) is det. % intersect(SetA, SetB) returns the intersection of SetA and SetB. % The efficiency of the intersection operation is not sensitive to the % argument ordering. Takes O(rep_size(SetA) + rep_size(SetB)) time and % O(min(rep_size(SetA)), rep_size(SetB)) space. % :- func intersect(fatter_sparse_bitset(T), fatter_sparse_bitset(T)) = fatter_sparse_bitset(T). :- pred intersect(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det. % intersect_list(Sets, Set) returns the intersection of all the sets % in Sets. % :- func intersect_list(list(fatter_sparse_bitset(T))) = fatter_sparse_bitset(T). :- pred intersect_list(list(fatter_sparse_bitset(T))::in, fatter_sparse_bitset(T)::out) is det. % difference(SetA, SetB) returns the set containing all the elements % of SetA except those that occur in SetB. Takes % O(rep_size(SetA) + rep_size(SetB)) time and O(rep_size(SetA)) space. % :- func difference(fatter_sparse_bitset(T), fatter_sparse_bitset(T)) = fatter_sparse_bitset(T). :- pred difference(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out) is det. %--------------------------------------------------% % % Operations that divide a set into two parts. % % divide(Pred, Set, InPart, OutPart): % InPart consists of those elements of Set for which Pred succeeds; % OutPart consists of those elements of Set for which Pred fails. % :- pred divide(pred(T)::in(pred(in) is semidet), fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out, fatter_sparse_bitset(T)::out) is det <= uenum(T). % divide_by_set(DivideBySet, Set, InPart, OutPart): % InPart consists of those elements of Set which are also in DivideBySet; % OutPart consists of those elements of Set which are not in DivideBySet. % :- pred divide_by_set(fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out, fatter_sparse_bitset(T)::out) is det <= uenum(T). %--------------------------------------------------% % % Converting lists to sets. % % list_to_set(List) returns a set containing only the members of List. % In the worst case this will take O(length(List)^2) time and space. % If the elements of the list are closely grouped, it will be closer % to O(length(List)). % :- func list_to_set(list(T)) = fatter_sparse_bitset(T) <= uenum(T). :- pred list_to_set(list(T)::in, fatter_sparse_bitset(T)::out) is det <= uenum(T). % sorted_list_to_set(List) returns a set containing only the members % of List. List must be sorted. Takes O(length(List)) time and space. % :- func sorted_list_to_set(list(T)) = fatter_sparse_bitset(T) <= uenum(T). :- pred sorted_list_to_set(list(T)::in, fatter_sparse_bitset(T)::out) is det <= uenum(T). %--------------------------------------------------% % % Converting sets to lists. % % to_sorted_list(Set) returns a list containing all the members of Set, % in sorted order. Takes O(card(Set)) time and space. % :- func to_sorted_list(fatter_sparse_bitset(T)) = list(T) <= uenum(T). :- pred to_sorted_list(fatter_sparse_bitset(T)::in, list(T)::out) is det <= uenum(T). %--------------------------------------------------% % % Converting between different kinds of sets. % % from_set(Set) returns a bitset containing only the members of Set. % Takes O(card(Set)) time and space. % :- func from_set(set.set(T)) = fatter_sparse_bitset(T) <= uenum(T). % to_set(Set) returns a set.set containing all the members of Set, % Takes O(card(Set)) time and space. % :- func to_set(fatter_sparse_bitset(T)) = set.set(T) <= uenum(T). %--------------------------------------------------% % % Counting. % % count(Set) returns the number of elements in Set. % Takes O(card(Set)) time. % :- func count(fatter_sparse_bitset(T)) = int <= uenum(T). %--------------------------------------------------% % % Standard higher order functions on collections. % % all_true(Pred, Set) succeeds iff Pred(Element) succeeds % for all the elements of Set. % :- pred all_true(pred(T)::in(pred(in) is semidet), fatter_sparse_bitset(T)::in) is semidet <= uenum(T). % filter(Pred, Set) returns the elements of Set for which Pred succeeds. % :- func filter(pred(T)::in(pred(in) is semidet), fatter_sparse_bitset(T)::in) = (fatter_sparse_bitset(T)::out) is det <= uenum(T). % filter(Pred, Set, TrueSet, FalseSet) returns the elements of Set % for which Pred succeeds, and those for which it fails. % :- pred filter(pred(T)::in(pred(in) is semidet), fatter_sparse_bitset(T)::in, fatter_sparse_bitset(T)::out, fatter_sparse_bitset(T)::out) is det <= uenum(T). % foldl(Func, Set, Start) calls Func with each element of Set % (in sorted order) and an accumulator (with the initial value of Start), % and returns the final value. Takes O(card(Set)) time. % :- func foldl(func(T, U) = U, fatter_sparse_bitset(T), U) = U <= uenum(T). :- pred foldl(pred(T, U, U), fatter_sparse_bitset(T), U, U) <= uenum(T). :- mode foldl(in(pred(in, in, out) is det), in, in, out) is det. :- mode foldl(in(pred(in, mdi, muo) is det), in, mdi, muo) is det. :- mode foldl(in(pred(in, di, uo) is det), in, di, uo) is det. :- mode foldl(in(pred(in, in, out) is semidet), in, in, out) is semidet. :- mode foldl(in(pred(in, mdi, muo) is semidet), in, mdi, muo) is semidet. :- mode foldl(in(pred(in, di, uo) is semidet), in, di, uo) is semidet. :- mode foldl(in(pred(in, in, out) is nondet), in, in, out) is nondet. :- mode foldl(in(pred(in, in, out) is cc_multi), in, in, out) is cc_multi. :- mode foldl(in(pred(in, di, uo) is cc_multi), in, di, uo) is cc_multi. :- pred foldl2(pred(T, U, U, V, V), fatter_sparse_bitset(T), U, U, V, V) <= uenum(T). :- mode foldl2(in(pred(in, in, out, in, out) is det), in, in, out, in, out) is det. :- mode foldl2(in(pred(in, in, out, mdi, muo) is det), in, in, out, mdi, muo) is det. :- mode foldl2(in(pred(in, in, out, di, uo) is det), in, in, out, di, uo) is det. :- mode foldl2(in(pred(in, di, uo, di, uo) is det), in, di, uo, di, uo) is det. :- mode foldl2(in(pred(in, in, out, in, out) is semidet), in, in, out, in, out) is semidet. :- mode foldl2(in(pred(in, in, out, mdi, muo) is semidet), in, in, out, mdi, muo) is semidet. :- mode foldl2(in(pred(in, in, out, di, uo) is semidet), in, in, out, di, uo) is semidet. :- mode foldl2(in(pred(in, in, out, in, out) is nondet), in, in, out, in, out) is nondet. :- mode foldl2(in(pred(in, in, out, in, out) is cc_multi), in, in, out, in, out) is cc_multi. :- mode foldl2(in(pred(in, in, out, di, uo) is cc_multi), in, in, out, di, uo) is cc_multi. :- mode foldl2(in(pred(in, di, uo, di, uo) is cc_multi), in, di, uo, di, uo) is cc_multi. % foldr(Func, Set, Start) calls Func with each element of Set % (in reverse sorted order) and an accumulator (with the initial value % of Start), and returns the final value. Takes O(card(Set)) time. % :- func foldr(func(T, U) = U, fatter_sparse_bitset(T), U) = U <= uenum(T). :- pred foldr(pred(T, U, U), fatter_sparse_bitset(T), U, U) <= uenum(T). :- mode foldr(in(pred(in, in, out) is det), in, in, out) is det. :- mode foldr(in(pred(in, mdi, muo) is det), in, mdi, muo) is det. :- mode foldr(in(pred(in, di, uo) is det), in, di, uo) is det. :- mode foldr(in(pred(in, in, out) is semidet), in, in, out) is semidet. :- mode foldr(in(pred(in, mdi, muo) is semidet), in, mdi, muo) is semidet. :- mode foldr(in(pred(in, di, uo) is semidet), in, di, uo) is semidet. :- mode foldr(in(pred(in, in, out) is nondet), in, in, out) is nondet. :- mode foldr(in(pred(in, in, out) is cc_multi), in, in, out) is cc_multi. :- mode foldr(in(pred(in, di, uo) is cc_multi), in, di, uo) is cc_multi. :- pred foldr2(pred(T, U, U, V, V), fatter_sparse_bitset(T), U, U, V, V) <= uenum(T). :- mode foldr2(in(pred(in, in, out, in, out) is det), in, in, out, in, out) is det. :- mode foldr2(in(pred(in, in, out, mdi, muo) is det), in, in, out, mdi, muo) is det. :- mode foldr2(in(pred(in, in, out, di, uo) is det), in, in, out, di, uo) is det. :- mode foldr2(in(pred(in, di, uo, di, uo) is det), in, di, uo, di, uo) is det. :- mode foldr2(in(pred(in, in, out, in, out) is semidet), in, in, out, in, out) is semidet. :- mode foldr2(in(pred(in, in, out, mdi, muo) is semidet), in, in, out, mdi, muo) is semidet. :- mode foldr2(in(pred(in, in, out, di, uo) is semidet), in, in, out, di, uo) is semidet. :- mode foldr2(in(pred(in, in, out, in, out) is nondet), in, in, out, in, out) is nondet. :- mode foldr2(in(pred(in, di, uo, di, uo) is cc_multi), in, di, uo, di, uo) is cc_multi. :- mode foldr2(in(pred(in, in, out, di, uo) is cc_multi), in, in, out, di, uo) is cc_multi. :- mode foldr2(in(pred(in, in, out, in, out) is cc_multi), in, in, out, in, out) is cc_multi. %--------------------------------------------------% %--------------------------------------------------%
Next: float, Previous: fat_sparse_bitset, Up: Top [Contents]