1 <chapter xmlns=
"http://docbook.org/ns/docbook" version=
"5.0"
2 xml:
id=
"manual.ext.containers.pbds" xreflabel=
"pbds">
4 <title>Policy-Based Data Structures
</title>
38 <?dbhtml filename=
"policy_data_structures.html"?>
40 <!-- 2006-04-01 Ami Tavory -->
41 <!-- 2011-05-25 Benjamin Kosnik -->
44 <section xml:
id=
"pbds.intro">
45 <info><title>Intro
</title></info>
48 This is a library of policy-based elementary data structures:
49 associative containers and priority queues. It is designed for
50 high-performance, flexibility, semantic safety, and conformance to
51 the corresponding containers in
<literal>std
</literal> and
52 <literal>std::tr1
</literal> (except for some points where it differs
58 <section xml:
id=
"pbds.intro.issues">
59 <info><title>Performance Issues
</title></info>
64 An attempt is made to categorize the wide variety of possible
65 container designs in terms of performance-impacting factors. These
66 performance factors are translated into design policies and
67 incorporated into container design.
71 There is tension between unravelling factors into a coherent set of
72 policies. Every attempt is made to make a minimal set of
73 factors. However, in many cases multiple factors make for long
74 template names. Every attempt is made to alias and use typedefs in
75 the source files, but the generated names for external symbols can
76 be large for binary files or debuggers.
80 In many cases, the longer names allow capabilities and behaviours
81 controlled by macros to also be unamibiguously emitted as distinct
86 Specific issues found while unraveling performance factors in the
87 design of associative containers and priority queues follow.
90 <section xml:
id=
"pbds.intro.issues.associative">
91 <info><title>Associative
</title></info>
94 Associative containers depend on their composite policies to a very
95 large extent. Implicitly hard-wiring policies can hamper their
96 performance and limit their functionality. An efficient hash-based
97 container, for example, requires policies for testing key
98 equivalence, hashing keys, translating hash values into positions
99 within the hash table, and determining when and how to resize the
100 table internally. A tree-based container can efficiently support
101 order statistics, i.e. the ability to query what is the order of
102 each key within the sequence of keys in the container, but only if
103 the container is supplied with a policy to internally update
104 meta-data. There are many other such examples.
108 Ideally, all associative containers would share the same
109 interface. Unfortunately, underlying data structures and mapping
110 semantics differentiate between different containers. For example,
111 suppose one writes a generic function manipulating an associative
116 template
<typename Cntnr
>
118 some_op_sequence(Cntnr
& r_cnt)
125 Given this, then what can one assume about the instantiating
126 container? The answer varies according to its underlying data
127 structure. If the underlying data structure of
128 <literal>Cntnr
</literal> is based on a tree or trie, then the order
129 of elements is well defined; otherwise, it is not, in general. If
130 the underlying data structure of
<literal>Cntnr
</literal> is based
131 on a collision-chaining hash table, then modifying
132 r_
<literal>Cntnr
</literal> will not invalidate its iterators' order;
133 if the underlying data structure is a probing hash table, then this
134 is not the case. If the underlying data structure is based on a tree
135 or trie, then a reference to the container can efficiently be split;
136 otherwise, it cannot, in general. If the underlying data structure
137 is a red-black tree, then splitting a reference to the container is
138 exception-free; if it is an ordered-vector tree, exceptions can be
144 <section xml:
id=
"pbds.intro.issues.priority_queue">
145 <info><title>Priority Que
</title></info>
148 Priority queues are useful when one needs to efficiently access a
149 minimum (or maximum) value as the set of values changes.
153 Most useful data structures for priority queues have a relatively
154 simple structure, as they are geared toward relatively simple
155 requirements. Unfortunately, these structures do not support access
156 to an arbitrary value, which turns out to be necessary in many
157 algorithms. Say, decreasing an arbitrary value in a graph
158 algorithm. Therefore, some extra mechanism is necessary and must be
159 invented for accessing arbitrary values. There are at least two
160 alternatives: embedding an associative container in a priority
161 queue, or allowing cross-referencing through iterators. The first
162 solution adds significant overhead; the second solution requires a
163 precise definition of iterator invalidation. Which is the next
168 Priority queues, like hash-based containers, store values in an
169 order that is meaningless and undefined externally. For example, a
170 <code>push
</code> operation can internally reorganize the
171 values. Because of this characteristic, describing a priority
172 queues' iterator is difficult: on one hand, the values to which
173 iterators point can remain valid, but on the other, the logical
174 order of iterators can change unpredictably.
178 Roughly speaking, any element that is both inserted to a priority
179 queue (e.g. through
<code>push
</code>) and removed
180 from it (e.g., through
<code>pop
</code>), incurs a
181 logarithmic overhead (in the amortized sense). Different underlying
182 data structures place the actual cost differently: some are
183 optimized for amortized complexity, whereas others guarantee that
184 specific operations only have a constant cost. One underlying data
185 structure might be chosen if modifying a value is frequent
186 (Dijkstra's shortest-path algorithm), whereas a different one might
187 be chosen otherwise. Unfortunately, an array-based binary heap - an
188 underlying data structure that optimizes (in the amortized sense)
189 <code>push
</code> and
<code>pop
</code> operations, differs from the
190 others in terms of its invalidation guarantees. Other design
191 decisions also impact the cost and placement of the overhead, at the
192 expense of more difference in the the kinds of operations that the
193 underlying data structure can support. These differences pose a
194 challenge when creating a uniform interface for priority queues.
199 <section xml:
id=
"pbds.intro.motivation">
200 <info><title>Goals
</title></info>
203 Many fine associative-container libraries were already written,
204 most notably, the C++ standard's associative containers. Why
205 then write another library? This section shows some possible
206 advantages of this library, when considering the challenges in
207 the introduction. Many of these points stem from the fact that
208 the ISO C++ process introduced associative-containers in a
209 two-step process (first standardizing tree-based containers,
210 only then adding hash-based containers, which are fundamentally
211 different), did not standardize priority queues as containers,
212 and (in our opinion) overloads the iterator concept.
215 <section xml:
id=
"pbds.intro.motivation.associative">
216 <info><title>Associative
</title></info>
220 <section xml:
id=
"motivation.associative.policy">
221 <info><title>Policy Choices
</title></info>
223 Associative containers require a relatively large number of
224 policies to function efficiently in various settings. In some
225 cases this is needed for making their common operations more
226 efficient, and in other cases this allows them to support a
227 larger set of operations
233 Hash-based containers, for example, support look-up and
234 insertion methods (
<function>find
</function> and
235 <function>insert
</function>). In order to locate elements
236 quickly, they are supplied a hash functor, which instruct
237 how to transform a key object into some size type; a hash
238 functor might transform
<constant>"hello"</constant>
239 into
<constant>1123002298</constant>. A hash table, though,
240 requires transforming each key object into some size-type
241 type in some specific domain; a hash table with a
128-long
242 table might transform
<constant>"hello"</constant> into
243 position
<constant>63</constant>. The policy by which the
244 hash value is transformed into a position within the table
245 can dramatically affect performance. Hash-based containers
246 also do not resize naturally (as opposed to tree-based
247 containers, for example). The appropriate resize policy is
248 unfortunately intertwined with the policy that transforms
249 hash value into a position within the table.
255 Tree-based containers, for example, also support look-up and
256 insertion methods, and are primarily useful when maintaining
257 order between elements is important. In some cases, though,
258 one can utilize their balancing algorithms for completely
263 Figure A shows a tree whose each node contains two entries:
264 a floating-point key, and some size-type
265 <emphasis>metadata
</emphasis> (in bold beneath it) that is
266 the number of nodes in the sub-tree. (The root has key
0.99,
267 and has
5 nodes (including itself) in its sub-tree.) A
268 container based on this data structure can obviously answer
269 efficiently whether
0.3 is in the container object, but it
270 can also answer what is the order of
0.3 among all those in
271 the container object: see
<xref linkend=
"biblio.clrs2001"/>.
276 As another example, Figure B shows a tree whose each node
277 contains two entries: a half-open geometric line interval,
278 and a number
<emphasis>metadata
</emphasis> (in bold beneath
279 it) that is the largest endpoint of all intervals in its
280 sub-tree. (The root describes the interval
<constant>[
20,
281 36)
</constant>, and the largest endpoint in its sub-tree is
282 99.) A container based on this data structure can obviously
283 answer efficiently whether
<constant>[
3,
41)
</constant> is
284 in the container object, but it can also answer efficiently
285 whether the container object has intervals that intersect
286 <constant>[
3,
41)
</constant>. These types of queries are
287 very useful in geometric algorithms and lease-management
292 It is important to note, however, that as the trees are
293 modified, their internal structure changes. To maintain
294 these invariants, one must supply some policy that is aware
295 of these changes. Without this, it would be better to use a
296 linked list (in itself very efficient for these purposes).
303 <title>Node Invariants
</title>
306 <imagedata align=
"center" format=
"PNG" scale=
"100"
307 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_node_invariants.png"/>
310 <phrase>Node Invariants
</phrase>
317 <section xml:
id=
"motivation.associative.underlying">
318 <info><title>Underlying Data Structures
</title></info>
320 The standard C++ library contains associative containers based on
321 red-black trees and collision-chaining hash tables. These are
322 very useful, but they are not ideal for all types of
327 The figure below shows the different underlying data structures
328 currently supported in this library.
332 <title>Underlying Associative Data Structures
</title>
335 <imagedata align=
"center" format=
"PNG" scale=
"100"
336 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_different_underlying_dss_1.png"/>
339 <phrase>Underlying Associative Data Structures
</phrase>
345 A shows a collision-chaining hash-table, B shows a probing
346 hash-table, C shows a red-black tree, D shows a splay tree, E shows
347 a tree based on an ordered vector(implicit in the order of the
348 elements), F shows a PATRICIA trie, and G shows a list-based
349 container with update policies.
353 Each of these data structures has some performance benefits, in
354 terms of speed, size or both. For now, note that vector-based trees
355 and probing hash tables manipulate memory more efficiently than
356 red-black trees and collision-chaining hash tables, and that
357 list-based associative containers are very useful for constructing
362 Now consider a function manipulating a generic associative
366 template
<class Cntnr
>
368 some_op_sequence(Cntnr
&r_cnt)
375 Ideally, the underlying data structure
376 of
<classname>Cntnr
</classname> would not affect what can be
377 done with
<varname>r_cnt
</varname>. Unfortunately, this is not
382 For example, if
<classname>Cntnr
</classname>
383 is
<classname>std::map
</classname>, then the function can
387 std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)
390 in order to apply
<classname>foobar
</classname> to all
391 elements between
<classname>foo
</classname> and
392 <classname>bar
</classname>. If
393 <classname>Cntnr
</classname> is a hash-based container,
394 then this call's results are undefined.
398 Also, if
<classname>Cntnr
</classname> is tree-based, the type
399 and object of the comparison functor can be
400 accessed. If
<classname>Cntnr
</classname> is hash based, these
401 queries are nonsensical.
405 There are various other differences based on the container's
406 underlying data structure. For one, they can be constructed by,
407 and queried for, different policies. Furthermore:
413 Containers based on C, D, E and F store elements in a
414 meaningful order; the others store elements in a meaningless
415 (and probably time-varying) order. By implication, only
416 containers based on C, D, E and F can
417 support
<function>erase
</function> operations taking an
418 iterator and returning an iterator to the following element
419 without performance loss.
425 Containers based on C, D, E, and F can be split and joined
426 efficiently, while the others cannot. Containers based on C
427 and D, furthermore, can guarantee that this is exception-free;
428 containers based on E cannot guarantee this.
434 Containers based on all but E can guarantee that
435 erasing an element is exception free; containers based on E
436 cannot guarantee this. Containers based on all but B and E
437 can guarantee that modifying an object of their type does
438 not invalidate iterators or references to their elements,
439 while containers based on B and E cannot. Containers based
440 on C, D, and E can furthermore make a stronger guarantee,
441 namely that modifying an object of their type does not
442 affect the order of iterators.
448 A unified tag and traits system (as used for the C++ standard
449 library iterators, for example) can ease generic manipulation of
450 associative containers based on different underlying data
456 <section xml:
id=
"motivation.associative.iterators">
457 <info><title>Iterators
</title></info>
459 Iterators are centric to the design of the standard library
460 containers, because of the container/algorithm/iterator
461 decomposition that allows an algorithm to operate on a range
462 through iterators of some sequence. Iterators, then, are useful
463 because they allow going over a
464 specific
<emphasis>sequence
</emphasis>. The standard library
465 also uses iterators for accessing a
466 specific
<emphasis>element
</emphasis>: when an associative
467 container returns one through
<function>find
</function>. The
468 standard library consistently uses the same types of iterators
469 for both purposes: going over a range, and accessing a specific
470 found element. Before the introduction of hash-based containers
471 to the standard library, this made sense (with the exception of
472 priority queues, which are discussed later).
476 Using the standard associative containers together with
477 non-order-preserving associative containers (and also because of
478 priority-queues container), there is a possible need for
479 different types of iterators for self-organizing containers:
480 the iterator concept seems overloaded to mean two different
481 things (in some cases).
<remark> XXX
482 "ds_gen.html#find_range">Design::Associative
483 Containers::Data-Structure Genericity::Point-Type and Range-Type
487 <section xml:
id=
"associative.iterators.using">
489 <title>Using Point Iterators for Range Operations
</title>
492 Suppose
<classname>cntnr
</classname> is some associative
493 container, and say
<varname>c
</varname> is an object of
494 type
<classname>cntnr
</classname>. Then what will be the outcome
499 std::for_each(c.find(
1), c.find(
5), foo);
503 If
<classname>cntnr
</classname> is a tree-based container
504 object, then an in-order walk will
505 apply
<classname>foo
</classname> to the relevant elements,
506 as in the graphic below, label A. If
<varname>c
</varname> is
507 a hash-based container, then the order of elements between any
508 two elements is undefined (and probably time-varying); there is
509 no guarantee that the elements traversed will coincide with the
510 <emphasis>logical
</emphasis> elements between
1 and
5, as in
515 <title>Range Iteration in Different Data Structures
</title>
518 <imagedata align=
"center" format=
"PNG" scale=
"100"
519 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_point_iterators_range_ops_1.png"/>
522 <phrase>Node Invariants
</phrase>
528 In our opinion, this problem is not caused just because
529 red-black trees are order preserving while
530 collision-chaining hash tables are (generally) not - it
531 is more fundamental. Most of the standard's containers
532 order sequences in a well-defined manner that is
533 determined by their
<emphasis>interface
</emphasis>:
534 calling
<function>insert
</function> on a tree-based
535 container modifies its sequence in a predictable way, as
536 does calling
<function>push_back
</function> on a list or
537 a vector. Conversely, collision-chaining hash tables,
538 probing hash tables, priority queues, and list-based
539 containers (which are very useful for
"multimaps") are
540 self-organizing data structures; the effect of each
541 operation modifies their sequences in a manner that is
542 (practically) determined by their
543 <emphasis>implementation
</emphasis>.
547 Consequently, applying an algorithm to a sequence obtained from most
548 containers may or may not make sense, but applying it to a
549 sub-sequence of a self-organizing container does not.
553 <section xml:
id=
"associative.iterators.cost">
555 <title>Cost to Point Iterators to Enable Range Operations
</title>
558 Suppose
<varname>c
</varname> is some collision-chaining
559 hash-based container object, and one calls
561 <programlisting>c.find(
3)
</programlisting>
563 Then what composes the returned iterator?
567 In the graphic below, label A shows the simplest (and
568 most efficient) implementation of a collision-chaining
569 hash table. The little box marked
570 <classname>point_iterator
</classname> shows an object
571 that contains a pointer to the element's node. Note that
572 this
"iterator" has no way to move to the next element (
574 <function>operator++
</function>). Conversely, the little
575 box marked
<classname>iterator
</classname> stores both a
576 pointer to the element, as well as some other
577 information (the bucket number of the element). the
578 second iterator, then, is
"heavier" than the first one-
579 it requires more time and space. If we were to use a
580 different container to cross-reference into this
581 hash-table using these iterators - it would take much
582 more space. As noted above, nothing much can be done by
583 incrementing these iterators, so why is this extra
588 Alternatively, one might create a collision-chaining hash-table
589 where the lists might be linked, forming a monolithic total-element
590 list, as in the graphic below, label B. Here the iterators are as
591 light as can be, but the hash-table's operations are more
596 <title>Point Iteration in Hash Data Structures
</title>
599 <imagedata align=
"center" format=
"PNG" scale=
"100"
600 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_point_iterators_range_ops_2.png"/>
603 <phrase>Point Iteration in Hash Data Structures
</phrase>
609 It should be noted that containers based on collision-chaining
610 hash-tables are not the only ones with this type of behavior;
611 many other self-organizing data structures display it as well.
615 <section xml:
id=
"associative.iterators.invalidation">
616 <info><title>Invalidation Guarantees
</title></info>
617 <para>Consider the following snippet:
</para>
624 Following the call to
<classname>erase
</classname>, what is the
625 validity of
<classname>it
</classname>: can it be de-referenced?
626 can it be incremented?
630 The answer depends on the underlying data structure of the
631 container. The graphic below shows three cases: A1 and A2 show
632 a red-black tree; B1 and B2 show a probing hash-table; C1 and C2
633 show a collision-chaining hash table.
637 <title>Effect of erase in different underlying data structures
</title>
640 <imagedata align=
"center" format=
"PNG" scale=
"100"
641 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_invalidation_guarantee_erase.png"/>
644 <phrase>Effect of erase in different underlying data structures
</phrase>
652 Erasing
5 from A1 yields A2. Clearly, an iterator to
3 can
653 be de-referenced and incremented. The sequence of iterators
654 changed, but in a way that is well-defined by the interface.
660 Erasing
5 from B1 yields B2. Clearly, an iterator to
3 is
661 not valid at all - it cannot be de-referenced or
662 incremented; the order of iterators changed in a way that is
663 (practically) determined by the implementation and not by
670 Erasing
5 from C1 yields C2. Here the situation is more
671 complicated. On the one hand, there is no problem in
672 de-referencing
<classname>it
</classname>. On the other hand,
673 the order of iterators changed in a way that is
674 (practically) determined by the implementation and not by
681 So in the standard library containers, it is not always possible
682 to express whether
<varname>it
</varname> is valid or not. This
683 is true also for
<function>insert
</function>. Again, the
684 iterator concept seems overloaded.
687 </section> <!--iterators-->
690 <section xml:
id=
"motivation.associative.functions">
691 <info><title>Functional
</title></info>
696 The design of the functional overlay to the underlying data
697 structures differs slightly from some of the conventions used in
698 the C++ standard. A strict public interface of methods that
699 comprise only operations which depend on the class's internal
700 structure; other operations are best designed as external
701 functions. (See
<xref linkend=
"biblio.meyers02both"/>).With this
702 rubric, the standard associative containers lack some useful
703 methods, and provide other methods which would be better
707 <section xml:
id=
"motivation.associative.functions.erase">
708 <info><title><function>erase
</function></title></info>
713 Order-preserving standard associative containers provide the
722 which takes an iterator, erases the corresponding
723 element, and returns an iterator to the following
724 element. Also standardd hash-based associative
725 containers provide this method. This seemingly
726 increasesgenericity between associative containers,
727 since it is possible to use
730 typename C::iterator it = c.begin();
731 typename C::iterator e_it = c.end();
734 it = pred(*it)? c.erase(it) : ++it;
738 in order to erase from a container object
<varname>
739 c
</varname> all element which match a
740 predicate
<classname>pred
</classname>. However, in a
741 different sense this actually decreases genericity: an
742 integral implication of this method is that tree-based
743 associative containers' memory use is linear in the total
744 number of elements they store, while hash-based
745 containers' memory use is unbounded in the total number of
746 elements they store. Assume a hash-based container is
747 allowed to decrease its size when an element is
748 erased. Then the elements might be rehashed, which means
749 that there is no
"next" element - it is simply
750 undefined. Consequently, it is possible to infer from the
751 fact that the standard library's hash-based containers
752 provide this method that they cannot downsize when
753 elements are erased. As a consequence, different code is
754 needed to manipulate different containers, assuming that
755 memory should be conserved. Therefor, this library's
756 non-order preserving associative containers omit this
763 All associative containers include a conditional-erase method
773 which erases all elements matching a predicate. This is probably the
774 only way to ensure linear-time multiple-item erase which can
775 actually downsize a container.
781 The standard associative containers provide methods for
782 multiple-item erase of the form
789 erasing a range of elements given by a pair of
790 iterators. For tree-based or trie-based containers, this can
791 implemented more efficiently as a (small) sequence of split
792 and join operations. For other, unordered, containers, this
793 method isn't much better than an external loop. Moreover,
794 if
<varname>c
</varname> is a hash-based container,
798 c.erase(c.find(
2), c.find(
5))
801 is almost certain to do something
802 different than erasing all elements whose keys are between
2
803 and
5, and is likely to produce other undefined behavior.
807 </section> <!-- erase -->
809 <section xml:
id=
"motivation.associative.functions.split">
812 <function>split
</function> and
<function>join
</function>
816 It is well-known that tree-based and trie-based container
817 objects can be efficiently split or joined (See
818 <xref linkend=
"biblio.clrs2001"/>). Externally splitting or
819 joining trees is super-linear, and, furthermore, can throw
820 exceptions. Split and join methods, consequently, seem good
821 choices for tree-based container methods, especially, since as
822 noted just before, they are efficient replacements for erasing
826 </section> <!-- split -->
828 <section xml:
id=
"motivation.associative.functions.insert">
831 <function>insert
</function>
835 The standard associative containers provide methods of the form
838 template
<class It
>
844 for inserting a range of elements given by a pair of
845 iterators. At best, this can be implemented as an external loop,
846 or, even more efficiently, as a join operation (for the case of
847 tree-based or trie-based containers). Moreover, these methods seem
848 similar to constructors taking a range given by a pair of
849 iterators; the constructors, however, are transactional, whereas
850 the insert methods are not; this is possibly confusing.
853 </section> <!-- insert -->
855 <section xml:
id=
"motivation.associative.functions.compare">
858 <function>operator==
</function> and
<function>operator
<=
</function>
863 Associative containers are parametrized by policies allowing to
864 test key equivalence: a hash-based container can do this through
865 its equivalence functor, and a tree-based container can do this
866 through its comparison functor. In addition, some standard
867 associative containers have global function operators, like
868 <function>operator==
</function> and
<function>operator
<=
</function>,
869 that allow comparing entire associative containers.
873 In our opinion, these functions are better left out. To begin
874 with, they do not significantly improve over an external
875 loop. More importantly, however, they are possibly misleading -
876 <function>operator==
</function>, for example, usually checks for
877 equivalence, or interchangeability, but the associative
878 container cannot check for values' equivalence, only keys'
879 equivalence; also, are two containers considered equivalent if
880 they store the same values in different order? this is an
883 </section> <!-- compare -->
885 </section> <!-- functional -->
887 </section> <!--associative-->
889 <section xml:
id=
"pbds.intro.motivation.priority_queue">
890 <info><title>Priority Queues
</title></info>
892 <section xml:
id=
"motivation.priority_queue.policy">
893 <info><title>Policy Choices
</title></info>
896 Priority queues are containers that allow efficiently inserting
897 values and accessing the maximal value (in the sense of the
898 container's comparison functor). Their interface
899 supports
<function>push
</function>
900 and
<function>pop
</function>. The standard
901 container
<classname>std::priorityqueue
</classname> indeed support
902 these methods, but little else. For algorithmic and
903 software-engineering purposes, other methods are needed:
909 Many graph algorithms (see
910 <xref linkend=
"biblio.clrs2001"/>) require increasing a
911 value in a priority queue (again, in the sense of the
912 container's comparison functor), or joining two
913 priority-queue objects.
918 <para>The return type of
<classname>priority_queue
</classname>'s
919 <function>push
</function> method is a point-type iterator, which can
920 be used for modifying or erasing arbitrary values. For
923 priority_queue
<int
> p;
924 priority_queue
<int
>::point_iterator it = p.push(
3);
928 <para>These types of cross-referencing operations are necessary
929 for making priority queues useful for different applications,
930 especially graph applications.
</para>
935 It is sometimes necessary to erase an arbitrary value in a
936 priority queue. For example, consider
937 the
<function>select
</function> function for monitoring
943 select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds,
944 struct timeval *timeout);
947 then, as the select documentation states:
951 The nfds argument specifies the range of file
952 descriptors to be tested. The select() function tests file
953 descriptors in the range of
0 to nfds-
1.
</quote>
957 It stands to reason, therefore, that we might wish to
958 maintain a minimal value for
<varname>nfds
</varname>, and
959 priority queues immediately come to mind. Note, though, that
960 when a socket is closed, the minimal file description might
961 change; in the absence of an efficient means to erase an
962 arbitrary value from a priority queue, we might as well
963 avoid its use altogether.
967 The standard containers typically support iterators. It is
969 for
<classname>std::priority_queue
</classname> to omit them
970 (See
<xref linkend=
"biblio.meyers01stl"/>). One might
971 ask why do priority queues need to support iterators, since
972 they are self-organizing containers with a different purpose
973 than abstracting sequences. There are several reasons:
978 Iterators (even in self-organizing containers) are
979 useful for many purposes: cross-referencing
980 containers, serialization, and debugging code that uses
987 The standard library's hash-based containers support
988 iterators, even though they too are self-organizing
989 containers with a different purpose than abstracting
996 In standard-library-like containers, it is natural to specify the
997 interface of operations for modifying a value or erasing
998 a value (discussed previously) in terms of a iterators.
999 It should be noted that the standard
1000 containers also use iterators for accessing and
1001 manipulating a specific value. In hash-based
1002 containers, one checks the existence of a key by
1003 comparing the iterator returned by
<function>find
</function> to the
1004 iterator returned by
<function>end
</function>, and not by comparing a
1005 pointer returned by
<function>find
</function> to
<type>NULL
</type>.
1014 <section xml:
id=
"motivation.priority_queue.underlying">
1015 <info><title>Underlying Data Structures
</title></info>
1018 There are three main implementations of priority queues: the
1019 first employs a binary heap, typically one which uses a
1020 sequence; the second uses a tree (or forest of trees), which is
1021 typically less structured than an associative container's tree;
1022 the third simply uses an associative container. These are
1023 shown in the figure below with labels A1 and A2, B, and C.
1027 <title>Underlying Priority Queue Data Structures
</title>
1030 <imagedata align=
"center" format=
"PNG" scale=
"100"
1031 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_different_underlying_dss_2.png"/>
1034 <phrase>Underlying Priority Queue Data Structures
</phrase>
1040 No single implementation can completely replace any of the
1041 others. Some have better
<function>push
</function>
1042 and
<function>pop
</function> amortized performance, some have
1043 better bounded (worst case) response time than others, some
1044 optimize a single method at the expense of others, etc. In
1045 general the
"best" implementation is dictated by the specific
1050 As with associative containers, the more implementations
1051 co-exist, the more necessary a traits mechanism is for handling
1052 generic containers safely and efficiently. This is especially
1053 important for priority queues, since the invalidation guarantees
1054 of one of the most useful data structures - binary heaps - is
1055 markedly different than those of most of the others.
1060 <section xml:
id=
"motivation.priority_queue.binary_heap">
1061 <info><title>Binary Heaps
</title></info>
1065 Binary heaps are one of the most useful underlying
1066 data structures for priority queues. They are very efficient in
1067 terms of memory (since they don't require per-value structure
1068 metadata), and have the best amortized
<function>push
</function> and
1069 <function>pop
</function> performance for primitive types like
1074 The standard library's
<classname>priority_queue
</classname>
1075 implements this data structure as an adapter over a sequence,
1077 <classname>std::vector
</classname>
1078 or
<classname>std::deque
</classname>, which correspond to labels
1079 A1 and A2 respectively in the graphic above.
1083 This is indeed an elegant example of the adapter concept and
1084 the algorithm/container/iterator decomposition. (See
<xref linkend=
"biblio.nelson96stlpq"/>). There are
1085 several reasons why a binary-heap priority queue
1086 may be better implemented as a container instead of a
1093 <classname>std::priority_queue
</classname> cannot erase values
1094 from its adapted sequence (irrespective of the sequence
1095 type). This means that the memory use of
1096 an
<classname>std::priority_queue
</classname> object is always
1097 proportional to the maximal number of values it ever contained,
1098 and not to the number of values that it currently
1099 contains. (See
<filename>performance/priority_queue_text_pop_mem_usage.cc
</filename>.)
1100 This implementation of binary heaps acts very differently than
1101 other underlying data structures (See also pairing heaps).
1107 Some combinations of adapted sequences and value types
1108 are very inefficient or just don't make sense. If one uses
1109 <classname>std::priority_queue
<std::vector
<std::string
>
1110 > ></classname>, for example, then not only will each
1111 operation perform a logarithmic number of
1112 <classname>std::string
</classname> assignments, but, furthermore, any
1113 operation (including
<function>pop
</function>) can render the container
1114 useless due to exceptions. Conversely, if one uses
1115 <classname>std::priority_queue
<std::deque
<int
> >
1116 ></classname>, then each operation uses incurs a logarithmic
1117 number of indirect accesses (through pointers) unnecessarily.
1118 It might be better to let the container make a conservative
1119 deduction whether to use the structure in the graphic above, labels A1 or A2.
1125 There does not seem to be a systematic way to determine
1126 what exactly can be done with the priority queue.
1131 If
<classname>p
</classname> is a priority queue adapting an
1132 <classname>std::vector
</classname>, then it is possible to iterate over
1133 all values by using
<function>&p.top()
</function> and
1134 <function>&p.top() + p.size()
</function>, but this will not work
1135 if
<varname>p
</varname> is adapting an
<classname>std::deque
</classname>; in any
1136 case, one cannot use
<classname>p.begin()
</classname> and
1137 <classname>p.end()
</classname>. If a different sequence is adapted, it
1138 is even more difficult to determine what can be
1145 If
<varname>p
</varname> is a priority queue adapting an
1146 <classname>std::deque
</classname>, then the reference return by
1152 will remain valid until it is popped,
1153 but if
<varname>p
</varname> adapts an
<classname>std::vector
</classname>, the
1154 next
<function>push
</function> will invalidate it. If a different
1155 sequence is adapted, it is even more difficult to
1156 determine what can be done.
1164 Sequence-based binary heaps can still implement
1165 linear-time
<function>erase
</function> and
<function>modify
</function> operations.
1166 This means that if one needs to erase a small
1167 (say logarithmic) number of values, then one might still
1168 choose this underlying data structure. Using
1169 <classname>std::priority_queue
</classname>, however, this will generally
1170 change the order of growth of the entire sequence of
1178 </section> <!-- goals/motivation -->
1179 </section> <!-- intro -->
1182 <section xml:
id=
"containers.pbds.using">
1183 <info><title>Using
</title></info>
1184 <?dbhtml filename=
"policy_data_structures_using.html"?>
1186 <section xml:
id=
"pbds.using.prereq">
1187 <info><title>Prerequisites
</title></info>
1189 <para>The library contains only header files, and does not require any
1190 other libraries except the standard C++ library . All classes are
1191 defined in namespace
<code>__gnu_pbds
</code>. The library internally
1192 uses macros beginning with
<code>PB_DS
</code>, but
1193 <code>#undef
</code>s anything it
<code>#define
</code>s (except for
1194 header guards). Compiling the library in an environment where macros
1195 beginning in
<code>PB_DS
</code> are defined, may yield unpredictable
1196 results in compilation, execution, or both.
</para>
1199 Further dependencies are necessary to create the visual output for the
1200 performance tests. To create these graphs, two additional packages
1201 will be needed:
<command>pychart
</command> and
<command>Beautiful
1206 <section xml:
id=
"pbds.using.organization">
1207 <info><title>Organization
</title></info>
1210 The various data structures are organized as follows.
1222 <classname>basic_branch
</classname>
1223 is an abstract base class for branched-based
1224 associative-containers
1230 <classname>tree
</classname>
1231 is a concrete base class for tree-based
1232 associative-containers
1238 <classname>trie
</classname>
1239 is a concrete base class trie-based
1240 associative-containers
1253 <classname>basic_hash_table
</classname>
1254 is an abstract base class for hash-based
1255 associative-containers
1261 <classname>cc_hash_table
</classname>
1262 is a concrete collision-chaining hash-based
1263 associative-containers
1269 <classname>gp_hash_table
</classname>
1270 is a concrete (general) probing hash-based
1271 associative-containers
1284 <classname>list_update
</classname>
1285 list-based update-policy associative container
1297 <classname>priority_queue
</classname>
1306 The hierarchy is composed naturally so that commonality is
1307 captured by base classes. Thus
<function>operator[]
</function>
1308 is defined at the base of any hierarchy, since all derived
1309 containers support it. Conversely
<function>split
</function> is
1310 defined in
<classname>basic_branch
</classname>, since only
1311 tree-like containers support it.
1315 In addition, there are the following diagnostics classes,
1316 used to report errors specific to this library's data
1321 <title>Exception Hierarchy
</title>
1324 <imagedata align=
"center" format=
"PNG" scale=
"100"
1325 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_exception_hierarchy.png"/>
1328 <imagedata align=
"center" format=
"PDF" scale=
"50"
1329 fileref=
"../images/pbds_exception_hierarchy.pdf"/>
1332 <phrase>Exception Hierarchy
</phrase>
1339 <section xml:
id=
"pbds.using.tutorial">
1340 <info><title>Tutorial
</title></info>
1342 <section xml:
id=
"pbds.using.tutorial.basic">
1343 <info><title>Basic Use
</title></info>
1346 For the most part, the policy-based containers containers in
1347 namespace
<literal>__gnu_pbds
</literal> have the same interface as
1348 the equivalent containers in the standard C++ library, except for
1349 the names used for the container classes themselves. For example,
1350 this shows basic operations on a collision-chaining hash-based
1354 #include
<ext/pb_ds/assoc_container.h
>
1358 __gnu_pbds::cc_hash_table
<int, char
> c;
1360 assert(c.find(
1) == c.end());
1365 The container is called
1366 <classname>__gnu_pbds::cc_hash_table
</classname> instead of
1367 <classname>std::unordered_map
</classname>, since
<quote>unordered
1368 map
</quote> does not necessarily mean a hash-based map as implied by
1369 the C++ library (C++
0x or TR1). For example, list-based associative
1370 containers, which are very useful for the construction of
1371 "multimaps," are also unordered.
1374 <para>This snippet shows a red-black tree based container:
</para>
1377 #include
<ext/pb_ds/assoc_container.h
>
1381 __gnu_pbds::tree
<int, char
> c;
1383 assert(c.find(
2) != c.end());
1387 <para>The container is called
<classname>tree
</classname> instead of
1388 <classname>map
</classname> since the underlying data structures are
1389 being named with specificity.
1393 The member function naming convention is to strive to be the same as
1394 the equivalent member functions in other C++ standard library
1395 containers. The familiar methods are unchanged:
1396 <function>begin
</function>,
<function>end
</function>,
1397 <function>size
</function>,
<function>empty
</function>, and
1398 <function>clear
</function>.
1402 This isn't to say that things are exactly as one would expect, given
1403 the container requirments and interfaces in the C++ standard.
1407 The names of containers' policies and policy accessors are
1408 different then the usual. For example, if
<type>hash_type
</type> is
1409 some type of hash-based container, then
</para>
1416 gives the type of its hash functor, and if
<varname>obj
</varname> is
1417 some hash-based container object, then
1424 <para>will return a reference to its hash-functor object.
</para>
1428 Similarly, if
<type>tree_type
</type> is some type of tree-based
1437 gives the type of its comparison functor, and if
1438 <varname>obj
</varname> is some tree-based container object,
1446 <para>will return a reference to its comparison-functor object.
</para>
1449 It would be nice to give names consistent with those in the existing
1450 C++ standard (inclusive of TR1). Unfortunately, these standard
1451 containers don't consistently name types and methods. For example,
1452 <classname>std::tr1::unordered_map
</classname> uses
1453 <type>hasher
</type> for the hash functor, but
1454 <classname>std::map
</classname> uses
<type>key_compare
</type> for
1455 the comparison functor. Also, we could not find an accessor for
1456 <classname>std::tr1::unordered_map
</classname>'s hash functor, but
1457 <classname>std::map
</classname> uses
<classname>compare
</classname>
1458 for accessing the comparison functor.
1462 Instead,
<literal>__gnu_pbds
</literal> attempts to be internally
1463 consistent, and uses standard-derived terminology if possible.
1467 Another source of difference is in scope:
1468 <literal>__gnu_pbds
</literal> contains more types of associative
1469 containers than the standard C++ library, and more opportunities
1470 to configure these new containers, since different types of
1471 associative containers are useful in different settings.
1475 Namespace
<literal>__gnu_pbds
</literal> contains different classes for
1476 hash-based containers, tree-based containers, trie-based containers,
1477 and list-based containers.
1481 Since associative containers share parts of their interface, they
1482 are organized as a class hierarchy.
1485 <para>Each type or method is defined in the most-common ancestor
1486 in which it makes sense.
1489 <para>For example, all associative containers support iteration
1490 expressed in the following form:
1508 But not all containers contain or use hash functors. Yet, both
1509 collision-chaining and (general) probing hash-based associative
1510 containers have a hash functor, so
1511 <classname>basic_hash_table
</classname> contains the interface:
1516 get_hash_fn() const;
1523 so all hash-based associative containers inherit the same
1524 hash-functor accessor methods.
1527 </section> <!--basic use -->
1529 <section xml:
id=
"pbds.using.tutorial.configuring">
1532 Configuring via Template Parameters
1537 In general, each of this library's containers is
1538 parametrized by more policies than those of the standard library. For
1539 example, the standard hash-based container is parametrized as
1543 template
<typename Key, typename Mapped, typename Hash,
1544 typename Pred, typename Allocator, bool Cache_Hashe_Code
>
1545 class unordered_map;
1549 and so can be configured by key type, mapped type, a functor
1550 that translates keys to unsigned integral types, an equivalence
1551 predicate, an allocator, and an indicator whether to store hash
1552 values with each entry. this library's collision-chaining
1553 hash-based container is parametrized as
1556 template
<typename Key, typename Mapped, typename Hash_Fn,
1557 typename Eq_Fn, typename Comb_Hash_Fn,
1558 typename Resize_Policy, bool Store_Hash
1559 typename Allocator
>
1560 class cc_hash_table;
1564 and so can be configured by the first four types of
1565 <classname>std::tr1::unordered_map
</classname>, then a
1566 policy for translating the key-hash result into a position
1567 within the table, then a policy by which the table resizes,
1568 an indicator whether to store hash values with each entry,
1569 and an allocator (which is typically the last template
1570 parameter in standard containers).
1574 Nearly all policy parameters have default values, so this
1575 need not be considered for casual use. It is important to
1576 note, however, that hash-based containers' policies can
1577 dramatically alter their performance in different settings,
1578 and that tree-based containers' policies can make them
1579 useful for other purposes than just look-up.
1583 <para>As opposed to associative containers, priority queues have
1584 relatively few configuration options. The priority queue is
1585 parametrized as follows:
</para>
1587 template
<typename Value_Type, typename Cmp_Fn,typename Tag,
1588 typename Allocator
>
1589 class priority_queue;
1592 <para>The
<classname>Value_Type
</classname>,
<classname>Cmp_Fn
</classname>, and
1593 <classname>Allocator
</classname> parameters are the container's value type,
1594 comparison-functor type, and allocator type, respectively;
1595 these are very similar to the standard's priority queue. The
1596 <classname>Tag
</classname> parameter is different: there are a number of
1597 pre-defined tag types corresponding to binary heaps, binomial
1598 heaps, etc., and
<classname>Tag
</classname> should be instantiated
1599 by one of them.
</para>
1601 <para>Note that as opposed to the
1602 <classname>std::priority_queue
</classname>,
1603 <classname>__gnu_pbds::priority_queue
</classname> is not a
1604 sequence-adapter; it is a regular container.
</para>
1608 <section xml:
id=
"pbds.using.tutorial.traits">
1611 Querying Container Attributes
1616 <para>A containers underlying data structure
1617 affect their performance; Unfortunately, they can also affect
1618 their interface. When manipulating generically associative
1619 containers, it is often useful to be able to statically
1620 determine what they can support and what the cannot.
1623 <para>Happily, the standard provides a good solution to a similar
1624 problem - that of the different behavior of iterators. If
1625 <classname>It
</classname> is an iterator, then
1628 typename std::iterator_traits
<It
>::iterator_category
1631 <para>is one of a small number of pre-defined tag classes, and
1634 typename std::iterator_traits
<It
>::value_type
1637 <para>is the value type to which the iterator
"points".
</para>
1640 Similarly, in this library, if
<type>C
</type> is a
1641 container, then
<classname>container_traits
</classname> is a
1642 trait class that stores information about the kind of
1643 container that is implemented.
1646 typename container_traits
<C
>::container_category
1649 is one of a small number of predefined tag structures that
1650 uniquely identifies the type of underlying data structure.
1653 <para>In most cases, however, the exact underlying data
1654 structure is not really important, but what is important is
1655 one of its other attributes: whether it guarantees storing
1656 elements by key order, for example. For this one can
1659 typename container_traits
<C
>::order_preserving
1665 typename container_traits
<C
>::invalidation_guarantee
1668 <para>is the container's invalidation guarantee. Invalidation
1669 guarantees are especially important regarding priority queues,
1670 since in this library's design, iterators are practically the
1671 only way to manipulate them.
</para>
1674 <section xml:
id=
"pbds.using.tutorial.point_range_iteration">
1677 Point and Range Iteration
1682 <para>This library differentiates between two types of methods
1683 and iterators: point-type, and range-type. For example,
1684 <function>find
</function> and
<function>insert
</function> are point-type methods, since
1685 they each deal with a specific element; their returned
1686 iterators are point-type iterators.
<function>begin
</function> and
1687 <function>end
</function> are range-type methods, since they are not used to
1688 find a specific element, but rather to go over all elements in
1689 a container object; their returned iterators are range-type
1693 <para>Most containers store elements in an order that is
1694 determined by their interface. Correspondingly, it is fine that
1695 their point-type iterators are synonymous with their range-type
1696 iterators. For example, in the following snippet
1699 std::for_each(c.find(
1), c.find(
5), foo);
1702 two point-type iterators (returned by
<function>find
</function>) are used
1703 for a range-type purpose - going over all elements whose key is
1708 Conversely, the above snippet makes no sense for
1709 self-organizing containers - ones that order (and reorder)
1710 their elements by implementation. It would be nice to have a
1711 uniform iterator system that would allow the above snippet to
1712 compile only if it made sense.
1716 This could trivially be done by specializing
1717 <function>std::for_each
</function> for the case of iterators returned by
1718 <classname>std::tr1::unordered_map
</classname>, but this would only solve the
1719 problem for one algorithm and one container. Fundamentally, the
1720 problem is that one can loop using a self-organizing
1721 container's point-type iterators.
1725 This library's containers define two families of
1726 iterators:
<type>point_const_iterator
</type> and
1727 <type>point_iterator
</type> are the iterator types returned by
1728 point-type methods;
<type>const_iterator
</type> and
1729 <type>iterator
</type> are the iterator types returned by range-type
1733 class
<- some container -
>
1738 typedef
<- something -
> const_iterator;
1740 typedef
<- something -
> iterator;
1742 typedef
<- something -
> point_const_iterator;
1744 typedef
<- something -
> point_iterator;
1751 const_iterator begin () const;
1755 point_const_iterator find(...) const;
1757 point_iterator find(...);
1762 containers whose interface defines sequence order , it
1763 is very simple: point-type and range-type iterators are exactly
1764 the same, which means that the above snippet will compile if it
1765 is used for an order-preserving associative container.
1769 For self-organizing containers, however, (hash-based
1770 containers as a special example), the preceding snippet will
1771 not compile, because their point-type iterators do not support
1772 <function>operator++
</function>.
1775 <para>In any case, both for order-preserving and self-organizing
1776 containers, the following snippet will compile:
1779 typename Cntnr::point_iterator it = c.find(
2);
1783 because a range-type iterator can always be converted to a
1784 point-type iterator.
1787 <para>Distingushing between iterator types also
1788 raises the point that a container's iterators might have
1789 different invalidation rules concerning their de-referencing
1790 abilities and movement abilities. This now corresponds exactly
1791 to the question of whether point-type and range-type iterators
1792 are valid. As explained above,
<classname>container_traits
</classname> allows
1793 querying a container for its data structure attributes. The
1794 iterator-invalidation guarantees are certainly a property of
1795 the underlying data structure, and so
1798 container_traits
<C
>::invalidation_guarantee
1802 gives one of three pre-determined types that answer this
1807 </section> <!-- tutorial -->
1809 <section xml:
id=
"pbds.using.examples">
1810 <info><title>Examples
</title></info>
1812 Additional code examples are provided in the source
1813 distribution, as part of the regression and performance
1817 <section xml:
id=
"pbds.using.examples.basic">
1818 <info><title>Intermediate Use
</title></info>
1824 <filename>basic_map.cc
</filename>
1831 <filename>basic_set.cc
</filename>
1837 Conditionally erasing values from an associative container object:
1838 <filename>erase_if.cc
</filename>
1844 Basic use of multimaps:
1845 <filename>basic_multimap.cc
</filename>
1851 Basic use of multisets:
1852 <filename>basic_multiset.cc
</filename>
1858 Basic use of priority queues:
1859 <filename>basic_priority_queue.cc
</filename>
1865 Splitting and joining priority queues:
1866 <filename>priority_queue_split_join.cc
</filename>
1872 Conditionally erasing values from a priority queue:
1873 <filename>priority_queue_erase_if.cc
</filename>
1880 <section xml:
id=
"pbds.using.examples.query">
1881 <info><title>Querying with
<classname>container_traits
</classname> </title></info>
1885 Using
<classname>container_traits
</classname> to query
1886 about underlying data structure behavior:
1887 <filename>assoc_container_traits.cc
</filename>
1893 A non-compiling example showing wrong use of finding keys in
1894 hash-based containers:
<filename>hash_find_neg.cc
</filename>
1899 Using
<classname>container_traits
</classname>
1900 to query about underlying data structure behavior:
1901 <filename>priority_queue_container_traits.cc
</filename>
1909 <section xml:
id=
"pbds.using.examples.container">
1910 <info><title>By Container Method
</title></info>
1913 <section xml:
id=
"pbds.using.examples.container.hash">
1914 <info><title>Hash-Based
</title></info>
1916 <section xml:
id=
"pbds.using.examples.container.hash.resize">
1917 <info><title>size Related
</title></info>
1922 Setting the initial size of a hash-based container
1924 <filename>hash_initial_size.cc
</filename>
1930 A non-compiling example showing how not to resize a
1931 hash-based container object:
1932 <filename>hash_resize_neg.cc
</filename>
1938 Resizing the size of a hash-based container object:
1939 <filename>hash_resize.cc
</filename>
1945 Showing an illegal resize of a hash-based container
1947 <filename>hash_illegal_resize.cc
</filename>
1953 Changing the load factors of a hash-based container
1954 object:
<filename>hash_load_set_change.cc
</filename>
1960 <section xml:
id=
"pbds.using.examples.container.hash.hashor">
1961 <info><title>Hashing Function Related
</title></info>
1967 Using a modulo range-hashing function for the case of an
1968 unknown skewed key distribution:
1969 <filename>hash_mod.cc
</filename>
1975 Writing a range-hashing functor for the case of a known
1976 skewed key distribution:
1977 <filename>shift_mask.cc
</filename>
1983 Storing the hash value along with each key:
1984 <filename>store_hash.cc
</filename>
1990 Writing a ranged-hash functor:
1991 <filename>ranged_hash.cc
</filename>
2000 <section xml:
id=
"pbds.using.examples.container.branch">
2001 <info><title>Branch-Based
</title></info>
2004 <section xml:
id=
"pbds.using.examples.container.branch.split">
2005 <info><title>split or join Related
</title></info>
2010 Joining two tree-based container objects:
2011 <filename>tree_join.cc
</filename>
2017 Splitting a PATRICIA trie container object:
2018 <filename>trie_split.cc
</filename>
2024 Order statistics while joining two tree-based container
2026 <filename>tree_order_statistics_join.cc
</filename>
2033 <section xml:
id=
"pbds.using.examples.container.branch.invariants">
2034 <info><title>Node Invariants
</title></info>
2039 Using trees for order statistics:
2040 <filename>tree_order_statistics.cc
</filename>
2046 Augmenting trees to support operations on line
2048 <filename>tree_intervals.cc
</filename>
2055 <section xml:
id=
"pbds.using.examples.container.branch.trie">
2056 <info><title>trie
</title></info>
2060 Using a PATRICIA trie for DNA strings:
2061 <filename>trie_dna.cc
</filename>
2068 trie for finding all entries whose key matches a given prefix:
2069 <filename>trie_prefix_search.cc
</filename>
2078 <section xml:
id=
"pbds.using.examples.container.priority_queue">
2079 <info><title>Priority Queues
</title></info>
2083 Cross referencing an associative container and a priority
2084 queue:
<filename>priority_queue_xref.cc
</filename>
2090 Cross referencing a vector and a priority queue using a
2091 very simple version of Dijkstra's shortest path
2093 <filename>priority_queue_dijkstra.cc
</filename>
2105 </section> <!-- using -->
2107 <!-- S03: Design -->
2110 <section xml:
id=
"containers.pbds.design">
2111 <info><title>Design
</title></info>
2112 <?dbhtml filename=
"policy_data_structures_design.html"?>
2115 <section xml:
id=
"pbds.design.concepts">
2116 <info><title>Concepts
</title></info>
2118 <section xml:
id=
"pbds.design.concepts.null_type">
2119 <info><title>Null Policy Classes
</title></info>
2122 Associative containers are typically parametrized by various
2123 policies. For example, a hash-based associative container is
2124 parametrized by a hash-functor, transforming each key into an
2125 non-negative numerical type. Each such value is then further mapped
2126 into a position within the table. The mapping of a key into a
2127 position within the table is therefore a two-step process.
2131 In some cases, instantiations are redundant. For example, when the
2132 keys are integers, it is possible to use a redundant hash policy,
2133 which transforms each key into its value.
2137 In some other cases, these policies are irrelevant. For example, a
2138 hash-based associative container might transform keys into positions
2139 within a table by a different method than the two-step method
2140 described above. In such a case, the hash functor is simply
2145 When a policy is either redundant or irrelevant, it can be replaced
2146 by
<classname>null_type
</classname>.
2150 For example, a
<emphasis>set
</emphasis> is an associative
2151 container with one of its template parameters (the one for the
2152 mapped type) replaced with
<classname>null_type
</classname>. Other
2153 places simplifications are made possible with this technique
2154 include node updates in tree and trie data structures, and hash
2155 and probe functions for hash data structures.
2159 <section xml:
id=
"pbds.design.concepts.associative_semantics">
2160 <info><title>Map and Set Semantics
</title></info>
2162 <section xml:
id=
"concepts.associative_semantics.set_vs_map">
2165 Distinguishing Between Maps and Sets
2170 Anyone familiar with the standard knows that there are four kinds
2171 of associative containers: maps, sets, multimaps, and
2172 multisets. The map datatype associates each key to
2177 Sets are associative containers that simply store keys -
2178 they do not map them to anything. In the standard, each map class
2179 has a corresponding set class. E.g.,
2180 <classname>std::map
<int, char
></classname> maps each
2181 <classname>int
</classname> to a
<classname>char
</classname>, but
2182 <classname>std::set
<int, char
></classname> simply stores
2183 <classname>int
</classname>s. In this library, however, there are no
2184 distinct classes for maps and sets. Instead, an associative
2185 container's
<classname>Mapped
</classname> template parameter is a policy: if
2186 it is instantiated by
<classname>null_type
</classname>, then it
2187 is a
"set"; otherwise, it is a
"map". E.g.,
2190 cc_hash_table
<int, char
>
2193 is a
"map" mapping each
<type>int
</type> value to a
<type>
2197 cc_hash_table
<int, null_type
>
2200 is a type that uniquely stores
<type>int
</type> values.
2202 <para>Once the
<classname>Mapped
</classname> template parameter is instantiated
2203 by
<classname>null_type
</classname>, then
2204 the
"set" acts very similarly to the standard's sets - it does not
2205 map each key to a distinct
<classname>null_type
</classname> object. Also,
2206 , the container's
<type>value_type
</type> is essentially
2207 its
<type>key_type
</type> - just as with the standard's sets
2211 The standard's multimaps and multisets allow, respectively,
2212 non-uniquely mapping keys and non-uniquely storing keys. As
2214 reasons why this might be necessary are
1) that a key might be
2215 decomposed into a primary key and a secondary key,
2) that a
2216 key might appear more than once, or
3) any arbitrary
2217 combination of
1)s and
2)s. Correspondingly,
2218 one should use
1)
"maps" mapping primary keys to secondary
2219 keys,
2)
"maps" mapping keys to size types, or
3) any arbitrary
2220 combination of
1)s and
2)s. Thus, for example, an
2221 <classname>std::multiset
<int
></classname> might be used to store
2222 multiple instances of integers, but using this library's
2223 containers, one might use
2226 tree
<int, size_t
>
2230 i.e., a
<classname>map
</classname> of
<type>int
</type>s to
2231 <type>size_t
</type>s.
2234 These
"multimaps" and
"multisets" might be confusing to
2235 anyone familiar with the standard's
<classname>std::multimap
</classname> and
2236 <classname>std::multiset
</classname>, because there is no clear
2237 correspondence between the two. For example, in some cases
2238 where one uses
<classname>std::multiset
</classname> in the standard, one might use
2239 in this library a
"multimap" of
"multisets" - i.e., a
2240 container that maps primary keys each to an associative
2241 container that maps each secondary key to the number of times
2246 When one uses a
"multimap," one should choose with care the
2247 type of container used for secondary keys.
2249 </section> <!-- map vs set -->
2252 <section xml:
id=
"concepts.associative_semantics.multi">
2253 <info><title>Alternatives to
<classname>std::multiset
</classname> and
<classname>std::multimap
</classname></title></info>
2256 Brace onself: this library does not contain containers like
2257 <classname>std::multimap
</classname> or
2258 <classname>std::multiset
</classname>. Instead, these data
2259 structures can be synthesized via manipulation of the
2260 <classname>Mapped
</classname> template parameter.
2263 One maps the unique part of a key - the primary key, into an
2264 associative-container of the (originally) non-unique parts of
2265 the key - the secondary key. A primary associative-container
2266 is an associative container of primary keys; a secondary
2267 associative-container is an associative container of
2272 Stepping back a bit, and starting in from the beginning.
2277 Maps (or sets) allow mapping (or storing) unique-key values.
2278 The standard library also supplies associative containers which
2279 map (or store) multiple values with equivalent keys:
2280 <classname>std::multimap
</classname>,
<classname>std::multiset
</classname>,
2281 <classname>std::tr1::unordered_multimap
</classname>, and
2282 <classname>unordered_multiset
</classname>. We first discuss how these might
2283 be used, then why we think it is best to avoid them.
2287 Suppose one builds a simple bank-account application that
2288 records for each client (identified by an
<classname>std::string
</classname>)
2289 and account-id (marked by an
<type>unsigned long
</type>) -
2290 the balance in the account (described by a
2291 <type>float
</type>). Suppose further that ordering this
2292 information is not useful, so a hash-based container is
2293 preferable to a tree based container. Then one can use
2297 std::tr1::unordered_map
<std::pair
<std::string, unsigned long
>, float, ...
>
2301 which hashes every combination of client and account-id. This
2302 might work well, except for the fact that it is now impossible
2303 to efficiently list all of the accounts of a specific client
2304 (this would practically require iterating over all
2305 entries). Instead, one can use
2309 std::tr1::unordered_multimap
<std::pair
<std::string, unsigned long
>, float, ...
>
2313 which hashes every client, and decides equivalence based on
2314 client only. This will ensure that all accounts belonging to a
2315 specific user are stored consecutively.
2319 Also, suppose one wants an integers' priority queue
2320 (a container that supports
<function>push
</function>,
2321 <function>pop
</function>, and
<function>top
</function> operations, the last of which
2322 returns the largest
<type>int
</type>) that also supports
2323 operations such as
<function>find
</function> and
<function>lower_bound
</function>. A
2324 reasonable solution is to build an adapter over
2325 <classname>std::set
<int
></classname>. In this adapter,
2326 <function>push
</function> will just call the tree-based
2327 associative container's
<function>insert
</function> method;
<function>pop
</function>
2328 will call its
<function>end
</function> method, and use it to return the
2329 preceding element (which must be the largest). Then this might
2330 work well, except that the container object cannot hold
2331 multiple instances of the same integer (
<function>push(
4)
</function>,
2332 will be a no-op if
<constant>4</constant> is already in the
2333 container object). If multiple keys are necessary, then one
2334 might build the adapter over an
2335 <classname>std::multiset
<int
></classname>.
2339 The standard library's non-unique-mapping containers are useful
2340 when (
1) a key can be decomposed in to a primary key and a
2341 secondary key, (
2) a key is needed multiple times, or (
3) any
2342 combination of (
1) and (
2).
2346 The graphic below shows how the standard library's container
2347 design works internally; in this figure nodes shaded equally
2348 represent equivalent-key values. Equivalent keys are stored
2349 consecutively using the properties of the underlying data
2350 structure: binary search trees (label A) store equivalent-key
2351 values consecutively (in the sense of an in-order walk)
2352 naturally; collision-chaining hash tables (label B) store
2353 equivalent-key values in the same bucket, the bucket can be
2354 arranged so that equivalent-key values are consecutive.
2358 <title>Non-unique Mapping Standard Containers
</title>
2361 <imagedata align=
"center" format=
"PNG" scale=
"100"
2362 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_embedded_lists_1.png"/>
2365 <phrase>Non-unique Mapping Standard Containers
</phrase>
2371 Put differently, the standards' non-unique mapping
2372 associative-containers are associative containers that map
2373 primary keys to linked lists that are embedded into the
2374 container. The graphic below shows again the two
2375 containers from the first graphic above, this time with
2376 the embedded linked lists of the grayed nodes marked
2380 <figure xml:
id=
"fig.pbds_embedded_lists_2">
2382 Effect of embedded lists in
2383 <classname>std::multimap
</classname>
2387 <imagedata align=
"center" format=
"PNG" scale=
"100"
2388 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_embedded_lists_2.png"/>
2392 Effect of embedded lists in
2393 <classname>std::multimap
</classname>
2400 These embedded linked lists have several disadvantages.
2406 The underlying data structure embeds the linked lists
2407 according to its own consideration, which means that the
2408 search path for a value might include several different
2409 equivalent-key values. For example, the search path for the
2410 the black node in either of the first graphic, labels A or B,
2411 includes more than a single gray node.
2417 The links of the linked lists are the underlying data
2418 structures' nodes, which typically are quite structured. In
2419 the case of tree-based containers (the grapic above, label
2420 B), each
"link" is actually a node with three pointers (one
2421 to a parent and two to children), and a
2422 relatively-complicated iteration algorithm. The linked
2423 lists, therefore, can take up quite a lot of memory, and
2424 iterating over all values equal to a given key (through the
2425 return value of the standard
2426 library's
<function>equal_range
</function>) can be
2433 The primary key is stored multiply; this uses more memory.
2439 Finally, the interface of this design excludes several
2440 useful underlying data structures. Of all the unordered
2441 self-organizing data structures, practically only
2442 collision-chaining hash tables can (efficiently) guarantee
2443 that equivalent-key values are stored consecutively.
2449 The above reasons hold even when the ratio of secondary keys to
2450 primary keys (or average number of identical keys) is small, but
2451 when it is large, there are more severe problems:
2457 The underlying data structures order the links inside each
2458 embedded linked-lists according to their internal
2459 considerations, which effectively means that each of the
2460 links is unordered. Irrespective of the underlying data
2461 structure, searching for a specific value can degrade to
2468 Similarly to the above point, it is impossible to apply
2469 to the secondary keys considerations that apply to primary
2470 keys. For example, it is not possible to maintain secondary
2471 keys by sorted order.
2477 While the interface
"understands" that all equivalent-key
2478 values constitute a distinct list (through
2479 <function>equal_range
</function>), the underlying data
2480 structure typically does not. This means that operations such
2481 as erasing from a tree-based container all values whose keys
2482 are equivalent to a a given key can be super-linear in the
2483 size of the tree; this is also true also for several other
2484 operations that target a specific list.
2491 In this library, all associative containers map
2492 (or store) unique-key values. One can (
1) map primary keys to
2493 secondary associative-containers (containers of
2494 secondary keys) or non-associative containers (
2) map identical
2495 keys to a size-type representing the number of times they
2496 occur, or (
3) any combination of (
1) and (
2). Instead of
2497 allowing multiple equivalent-key values, this library
2498 supplies associative containers based on underlying
2499 data structures that are suitable as secondary
2500 associative-containers.
2504 In the figure below, labels A and B show the equivalent
2505 underlying data structures in this library, as mapped to the
2506 first graphic above. Labels A and B, respectively. Each shaded
2507 box represents some size-type or secondary
2508 associative-container.
2512 <title>Non-unique Mapping Containers
</title>
2515 <imagedata align=
"center" format=
"PNG" scale=
"100"
2516 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_embedded_lists_3.png"/>
2519 <phrase>Non-unique Mapping Containers
</phrase>
2525 In the first example above, then, one would use an associative
2526 container mapping each user to an associative container which
2527 maps each application id to a start time (see
2528 <filename>example/basic_multimap.cc
</filename>); in the second
2529 example, one would use an associative container mapping
2530 each
<classname>int
</classname> to some size-type indicating the
2531 number of times it logically occurs
2532 (see
<filename>example/basic_multiset.cc
</filename>.
2536 See the discussion in list-based container types for containers
2537 especially suited as secondary associative-containers.
2541 </section> <!-- map and set semantics -->
2543 <section xml:
id=
"pbds.design.concepts.iterator_semantics">
2544 <info><title>Iterator Semantics
</title></info>
2546 <section xml:
id=
"concepts.iterator_semantics.point_and_range">
2547 <info><title>Point and Range Iterators
</title></info>
2550 Iterator concepts are bifurcated in this design, and are
2551 comprised of point-type and range-type iteration.
2555 A point-type iterator is an iterator that refers to a specific
2556 element as returned through an
2557 associative-container's
<function>find
</function> method.
2561 A range-type iterator is an iterator that is used to go over a
2562 sequence of elements, as returned by a container's
2563 <function>find
</function> method.
2567 A point-type method is a method that
2568 returns a point-type iterator; a range-type method is a method
2569 that returns a range-type iterator.
2572 <para>For most containers, these types are synonymous; for
2573 self-organizing containers, such as hash-based containers or
2574 priority queues, these are inherently different (in any
2575 implementation, including that of C++ standard library
2576 components), but in this design, it is made explicit. They are
2582 <section xml:
id=
"concepts.iterator_semantics.both">
2583 <info><title>Distinguishing Point and Range Iterators
</title></info>
2585 <para>When using this library, is necessary to differentiate
2586 between two types of methods and iterators: point-type methods and
2587 iterators, and range-type methods and iterators. Each associative
2588 container's interface includes the methods:
</para>
2590 point_const_iterator
2591 find(const_key_reference r_key) const;
2594 find(const_key_reference r_key);
2596 std::pair
<point_iterator,bool
>
2597 insert(const_reference r_val);
2600 <para>The relationship between these iterator types varies between
2601 container types. The figure below
2602 shows the most general invariant between point-type and
2603 range-type iterators: In
<emphasis>A
</emphasis> <literal>iterator
</literal>, can
2604 always be converted to
<literal>point_iterator
</literal>. In
<emphasis>B
</emphasis>
2605 shows invariants for order-preserving containers: point-type
2606 iterators are synonymous with range-type iterators.
2607 Orthogonally,
<emphasis>C
</emphasis>shows invariants for
"set"
2608 containers: iterators are synonymous with const iterators.
</para>
2611 <title>Point Iterator Hierarchy
</title>
2614 <imagedata align=
"center" format=
"PNG" scale=
"100"
2615 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_point_iterator_hierarchy.png"/>
2618 <phrase>Point Iterator Hierarchy
</phrase>
2624 <para>Note that point-type iterators in self-organizing containers
2625 (hash-based associative containers) lack movement
2626 operators, such as
<literal>operator++
</literal> - in fact, this
2627 is the reason why this library differentiates from the standard C++ librarys
2628 design on this point.
</para>
2630 <para>Typically, one can determine an iterator's movement
2632 <literal>std::iterator_traits
<It
>iterator_category
</literal>,
2633 which is a
<literal>struct
</literal> indicating the iterator's
2634 movement capabilities. Unfortunately, none of the standard predefined
2635 categories reflect a pointer's
<emphasis>not
</emphasis> having any
2636 movement capabilities whatsoever. Consequently,
2637 <literal>pb_ds
</literal> adds a type
2638 <literal>trivial_iterator_tag
</literal> (whose name is taken from
2639 a concept in C++ standardese, which is the category of iterators
2640 with no movement capabilities.) All other standard C++ library
2641 tags, such as
<literal>forward_iterator_tag
</literal> retain their
2646 <section xml:
id=
"pbds.design.concepts.invalidation">
2647 <info><title>Invalidation Guarantees
</title></info>
2649 If one manipulates a container object, then iterators previously
2650 obtained from it can be invalidated. In some cases a
2651 previously-obtained iterator cannot be de-referenced; in other cases,
2652 the iterator's next or previous element might have changed
2653 unpredictably. This corresponds exactly to the question whether a
2654 point-type or range-type iterator (see previous concept) is valid or
2655 not. In this design, one can query a container (in compile time) about
2656 its invalidation guarantees.
2661 Given three different types of associative containers, a modifying
2662 operation (in that example,
<function>erase
</function>) invalidated
2663 iterators in three different ways: the iterator of one container
2664 remained completely valid - it could be de-referenced and
2665 incremented; the iterator of a different container could not even be
2666 de-referenced; the iterator of the third container could be
2667 de-referenced, but its
"next" iterator changed unpredictably.
2671 Distinguishing between find and range types allows fine-grained
2672 invalidation guarantees, because these questions correspond exactly
2673 to the question of whether point-type iterators and range-type
2674 iterators are valid. The graphic below shows tags corresponding to
2675 different types of invalidation guarantees.
2679 <title>Invalidation Guarantee Tags Hierarchy
</title>
2682 <imagedata align=
"center" format=
"PNG" scale=
"100"
2683 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_invalidation_tag_hierarchy.png"/>
2686 <imagedata align=
"center" format=
"PDF" scale=
"50"
2687 fileref=
"../images/pbds_invalidation_tag_hierarchy.pdf"/>
2690 <phrase>Invalidation Guarantee Tags Hierarchy
</phrase>
2698 <classname>basic_invalidation_guarantee
</classname>
2699 corresponds to a basic guarantee that a point-type iterator,
2700 a found pointer, or a found reference, remains valid as long
2701 as the container object is not modified.
2707 <classname>point_invalidation_guarantee
</classname>
2708 corresponds to a guarantee that a point-type iterator, a
2709 found pointer, or a found reference, remains valid even if
2710 the container object is modified.
2716 <classname>range_invalidation_guarantee
</classname>
2717 corresponds to a guarantee that a range-type iterator remains
2718 valid even if the container object is modified.
2723 <para>To find the invalidation guarantee of a
2724 container, one can use
</para>
2726 typename container_traits
<Cntnr
>::invalidation_guarantee
2729 <para>Note that this hierarchy corresponds to the logic it
2730 represents: if a container has range-invalidation guarantees,
2731 then it must also have find invalidation guarantees;
2732 correspondingly, its invalidation guarantee (in this case
2733 <classname>range_invalidation_guarantee
</classname>)
2734 can be cast to its base class (in this case
<classname>point_invalidation_guarantee
</classname>).
2735 This means that this this hierarchy can be used easily using
2736 standard metaprogramming techniques, by specializing on the
2737 type of
<literal>invalidation_guarantee
</literal>.
</para>
2740 These types of problems were addressed, in a more general
2741 setting, in
<xref linkend=
"biblio.meyers96more"/> - Item
2. In
2742 our opinion, an invalidation-guarantee hierarchy would solve
2743 these problems in all container types - not just associative
2748 </section> <!-- iterator semantics -->
2750 <section xml:
id=
"pbds.design.concepts.genericity">
2751 <info><title>Genericity
</title></info>
2754 The design attempts to address the following problem of
2755 data-structure genericity. When writing a function manipulating
2756 a generic container object, what is the behavior of the object?
2760 template
<typename Cntnr
>
2762 some_op_sequence(Cntnr
&r_container)
2769 then one needs to address the following questions in the body
2770 of
<function>some_op_sequence
</function>:
2776 Which types and methods does
<literal>Cntnr
</literal> support?
2777 Containers based on hash tables can be queries for the
2778 hash-functor type and object; this is meaningless for tree-based
2779 containers. Containers based on trees can be split, joined, or
2780 can erase iterators and return the following iterator; this
2781 cannot be done by hash-based containers.
2787 What are the exception and invalidation guarantees
2788 of
<literal>Cntnr
</literal>? A container based on a probing
2789 hash-table invalidates all iterators when it is modified; this
2790 is not the case for containers based on node-based
2791 trees. Containers based on a node-based tree can be split or
2792 joined without exceptions; this is not the case for containers
2793 based on vector-based trees.
2799 How does the container maintain its elements? Tree-based and
2800 Trie-based containers store elements by key order; others,
2801 typically, do not. A container based on a splay trees or lists
2802 with update policies
"cache" "frequently accessed" elements;
2803 containers based on most other underlying data structures do
2809 How does one query a container about characteristics and
2810 capabilities? What is the relationship between two different
2811 data structures, if anything?
2816 <para>The remainder of this section explains these issues in
2820 <section xml:
id=
"concepts.genericity.tag">
2821 <info><title>Tag
</title></info>
2823 Tags are very useful for manipulating generic types. For example, if
2824 <literal>It
</literal> is an iterator class, then
<literal>typename
2825 It::iterator_category
</literal> or
<literal>typename
2826 std::iterator_traits
<It
>::iterator_category
</literal> will
2827 yield its category, and
<literal>typename
2828 std::iterator_traits
<It
>::value_type
</literal> will yield its
2833 This library contains a container tag hierarchy corresponding to the
2838 <title>Container Tag Hierarchy
</title>
2841 <imagedata align=
"center" format=
"PNG" scale=
"100"
2842 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_container_tag_hierarchy.png"/>
2845 <imagedata align=
"center" format=
"PDF" scale=
"50"
2846 fileref=
"../images/pbds_container_tag_hierarchy.pdf"/>
2849 <phrase>Container Tag Hierarchy
</phrase>
2855 Given any container
<type>Cntnr
</type>, the tag of
2856 the underlying data structure can be found via
<literal>typename
2857 Cntnr::container_category
</literal>.
2860 </section> <!-- tag -->
2862 <section xml:
id=
"concepts.genericity.traits">
2863 <info><title>Traits
</title></info>
2866 <para>Additionally, a traits mechanism can be used to query a
2867 container type for its attributes. Given any container
2868 <literal>Cntnr
</literal>, then
<literal><Cntnr
></literal>
2869 is a traits class identifying the properties of the
2872 <para>To find if a container can throw when a key is erased (which
2873 is true for vector-based trees, for example), one can
2876 <programlisting>container_traits
<Cntnr
>::erase_can_throw
</programlisting>
2879 Some of the definitions in
<classname>container_traits
</classname>
2880 are dependent on other
2881 definitions. If
<classname>container_traits
<Cntnr
>::order_preserving
</classname>
2882 is
<constant>true
</constant> (which is the case for containers
2883 based on trees and tries), then the container can be split or
2885 case,
<classname>container_traits
<Cntnr
>::split_join_can_throw
</classname>
2886 indicates whether splits or joins can throw exceptions (which is
2887 true for vector-based trees);
2888 otherwise
<classname>container_traits
<Cntnr
>::split_join_can_throw
</classname>
2889 will yield a compilation error. (This is somewhat similar to a
2890 compile-time version of the COM model).
2893 </section> <!-- traits -->
2895 </section> <!-- genericity -->
2896 </section> <!-- concepts -->
2898 <section xml:
id=
"pbds.design.container">
2899 <info><title>By Container
</title></info>
2902 <section xml:
id=
"pbds.design.container.hash">
2903 <info><title>hash
</title></info>
2908 /// general terms / background
2909 /// range hashing policies
2910 /// ranged-hash policies
2916 /// trigger policies
2919 // policy interactions
2920 /// probe/size/trigger
2922 /// eq/hash/storing hash values
2923 /// size/load-check trigger
2925 <section xml:
id=
"container.hash.interface">
2926 <info><title>Interface
</title></info>
2931 The collision-chaining hash-based container has the
2932 following declaration.
</para>
2937 typename Hash_Fn = std::hash
<Key
>,
2938 typename Eq_Fn = std::equal_to
<Key
>,
2939 typename Comb_Hash_Fn = direct_mask_range_hashing
<>
2940 typename Resize_Policy = default explained below.
2941 bool Store_Hash = false,
2942 typename Allocator = std::allocator
<char
> >
2943 class cc_hash_table;
2946 <para>The parameters have the following meaning:
</para>
2949 <listitem><para><classname>Key
</classname> is the key type.
</para></listitem>
2951 <listitem><para><classname>Mapped
</classname> is the mapped-policy.
</para></listitem>
2953 <listitem><para><classname>Hash_Fn
</classname> is a key hashing functor.
</para></listitem>
2955 <listitem><para><classname>Eq_Fn
</classname> is a key equivalence functor.
</para></listitem>
2957 <listitem><para><classname>Comb_Hash_Fn
</classname> is a range-hashing_functor;
2958 it describes how to translate hash values into positions
2959 within the table.
</para></listitem>
2961 <listitem><para><classname>Resize_Policy
</classname> describes how a container object
2962 should change its internal size.
</para></listitem>
2964 <listitem><para><classname>Store_Hash
</classname> indicates whether the hash value
2965 should be stored with each entry.
</para></listitem>
2967 <listitem><para><classname>Allocator
</classname> is an allocator
2968 type.
</para></listitem>
2971 <para>The probing hash-based container has the following
2977 typename Hash_Fn = std::hash
<Key
>,
2978 typename Eq_Fn = std::equal_to
<Key
>,
2979 typename Comb_Probe_Fn = direct_mask_range_hashing
<>
2980 typename Probe_Fn = default explained below.
2981 typename Resize_Policy = default explained below.
2982 bool Store_Hash = false,
2983 typename Allocator = std::allocator
<char
> >
2984 class gp_hash_table;
2987 <para>The parameters are identical to those of the
2988 collision-chaining container, except for the following.
</para>
2991 <listitem><para><classname>Comb_Probe_Fn
</classname> describes how to transform a probe
2992 sequence into a sequence of positions within the table.
</para></listitem>
2994 <listitem><para><classname>Probe_Fn
</classname> describes a probe sequence policy.
</para></listitem>
2997 <para>Some of the default template values depend on the values of
2998 other parameters, and are explained below.
</para>
3001 <section xml:
id=
"container.hash.details">
3002 <info><title>Details
</title></info>
3004 <section xml:
id=
"container.hash.details.hash_policies">
3005 <info><title>Hash Policies
</title></info>
3007 <section xml:
id=
"details.hash_policies.general">
3008 <info><title>General
</title></info>
3010 <para>Following is an explanation of some functions which hashing
3011 involves. The graphic below illustrates the discussion.
</para>
3014 <title>Hash functions, ranged-hash functions, and
3015 range-hashing functions
</title>
3018 <imagedata align=
"center" format=
"PNG" scale=
"100"
3019 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_ranged_hash_range_hashing_fns.png"/>
3022 <phrase>Hash functions, ranged-hash functions, and
3023 range-hashing functions
</phrase>
3028 <para>Let U be a domain (e.g., the integers, or the
3029 strings of
3 characters). A hash-table algorithm needs to map
3030 elements of U
"uniformly" into the range [
0,..., m -
3031 1] (where m is a non-negative integral value, and
3032 is, in general, time varying). I.e., the algorithm needs
3033 a ranged-hash function
</para>
3036 f : U × Z
<subscript>+
</subscript> → Z
<subscript>+
</subscript>
3039 <para>such that for any u in U ,
</para>
3041 <para>0 ≤ f(u, m) ≤ m -
1</para>
3043 <para>and which has
"good uniformity" properties (say
3044 <xref linkend=
"biblio.knuth98sorting"/>.)
3046 common solution is to use the composition of the hash
3049 <para>h : U → Z
<subscript>+
</subscript> ,
</para>
3051 <para>which maps elements of U into the non-negative
3052 integrals, and
</para>
3054 <para>g : Z
<subscript>+
</subscript> × Z
<subscript>+
</subscript> →
3055 Z
<subscript>+
</subscript>,
</para>
3057 <para>which maps a non-negative hash value, and a non-negative
3058 range upper-bound into a non-negative integral in the range
3059 between
0 (inclusive) and the range upper bound (exclusive),
3060 i.e., for any r in Z
<subscript>+
</subscript>,
</para>
3062 <para>0 ≤ g(r, m) ≤ m -
1</para>
3065 <para>The resulting ranged-hash function, is
</para>
3067 <!-- ranged_hash_composed_of_hash_and_range_hashing -->
3069 <title>Ranged Hash Function
</title>
3071 f(u , m) = g(h(u), m)
3075 <para>From the above, it is obvious that given g and
3076 h, f can always be composed (however the converse
3077 is not true). The standard's hash-based containers allow specifying
3078 a hash function, and use a hard-wired range-hashing function;
3079 the ranged-hash function is implicitly composed.
</para>
3081 <para>The above describes the case where a key is to be mapped
3082 into a single position within a hash table, e.g.,
3083 in a collision-chaining table. In other cases, a key is to be
3084 mapped into a sequence of positions within a table,
3085 e.g., in a probing table. Similar terms apply in this
3086 case: the table requires a ranged probe function,
3087 mapping a key into a sequence of positions withing the table.
3088 This is typically achieved by composing a hash function
3089 mapping the key into a non-negative integral type, a
3090 probe function transforming the hash value into a
3091 sequence of hash values, and a range-hashing function
3092 transforming the sequence of hash values into a sequence of
3097 <section xml:
id=
"details.hash_policies.range">
3098 <info><title>Range Hashing
</title></info>
3100 <para>Some common choices for range-hashing functions are the
3101 division, multiplication, and middle-square methods (
<xref linkend=
"biblio.knuth98sorting"/>), defined
3105 <title>Range-Hashing, Division Method
</title>
3113 <para>g(r, m) = ⌈ u/v ( a r mod v ) ⌉
</para>
3117 <para>g(r, m) = ⌈ u/v ( r
<superscript>2</superscript> mod v ) ⌉
</para>
3119 <para>respectively, for some positive integrals u and
3120 v (typically powers of
2), and some a. Each of
3121 these range-hashing functions works best for some different
3124 <para>The division method (see above) is a
3125 very common choice. However, even this single method can be
3126 implemented in two very different ways. It is possible to
3127 implement using the low
3128 level % (modulo) operation (for any m), or the
3129 low level
& (bit-mask) operation (for the case where
3130 m is a power of
2), i.e.,
</para>
3133 <title>Division via Prime Modulo
</title>
3142 <title>Division via Bit Mask
</title>
3144 g(r, m) = r
& m -
1, (with m =
3145 2<superscript>k
</superscript> for some k)
3150 <para>respectively.
</para>
3152 <para>The % (modulo) implementation has the advantage that for
3153 m a prime far from a power of
2, g(r, m) is
3154 affected by all the bits of r (minimizing the chance of
3155 collision). It has the disadvantage of using the costly modulo
3156 operation. This method is hard-wired into SGI's implementation
3159 <para>The
& (bit-mask) implementation has the advantage of
3160 relying on the fast bit-wise and operation. It has the
3161 disadvantage that for g(r, m) is affected only by the
3162 low order bits of r. This method is hard-wired into
3163 Dinkumware's implementation.
</para>
3168 <section xml:
id=
"details.hash_policies.ranged">
3169 <info><title>Ranged Hash
</title></info>
3171 <para>In cases it is beneficial to allow the
3172 client to directly specify a ranged-hash hash function. It is
3173 true, that the writer of the ranged-hash function cannot rely
3174 on the values of m having specific numerical properties
3175 suitable for hashing (in the sense used in
<xref linkend=
"biblio.knuth98sorting"/>), since
3176 the values of m are determined by a resize policy with
3177 possibly orthogonal considerations.
</para>
3179 <para>There are two cases where a ranged-hash function can be
3180 superior. The firs is when using perfect hashing: the
3181 second is when the values of m can be used to estimate
3182 the
"general" number of distinct values required. This is
3183 described in the following.
</para>
3188 s = [ s
<subscript>0</subscript>,..., s
<subscript>t -
1</subscript>]
3191 <para>be a string of t characters, each of which is from
3192 domain S. Consider the following ranged-hash
3196 A Standard String Hash Function
3199 f
<subscript>1</subscript>(s, m) = ∑
<subscript>i =
3200 0</subscript><superscript>t -
1</superscript> s
<subscript>i
</subscript> a
<superscript>i
</superscript> mod m
3205 <para>where a is some non-negative integral value. This is
3206 the standard string-hashing function used in SGI's
3207 implementation (with a =
5). Its advantage is that
3208 it takes into account all of the characters of the string.
</para>
3210 <para>Now assume that s is the string representation of a
3211 of a long DNA sequence (and so S = {'A', 'C', 'G',
3212 'T'}). In this case, scanning the entire string might be
3213 prohibitively expensive. A possible alternative might be to use
3214 only the first k characters of the string, where
</para>
3216 <para>|S|
<superscript>k
</superscript> ≥ m ,
</para>
3218 <para>i.e., using the hash function
</para>
3222 Only k String DNA Hash
3225 f
<subscript>2</subscript>(s, m) = ∑
<subscript>i
3226 =
0</subscript><superscript>k -
1</superscript> s
<subscript>i
</subscript> a
<superscript>i
</superscript> mod m
3230 <para>requiring scanning over only
</para>
3232 <para>k = log
<subscript>4</subscript>( m )
</para>
3234 <para>characters.
</para>
3236 <para>Other more elaborate hash-functions might scan k
3237 characters starting at a random position (determined at each
3238 resize), or scanning k random positions (determined at
3239 each resize), i.e., using
</para>
3241 <para>f
<subscript>3</subscript>(s, m) = ∑
<subscript>i =
3242 r
</subscript>0<superscript>r
<subscript>0</subscript> + k -
1</superscript> s
<subscript>i
</subscript>
3243 a
<superscript>i
</superscript> mod m ,
</para>
3247 <para>f
<subscript>4</subscript>(s, m) = ∑
<subscript>i =
0</subscript><superscript>k -
3248 1</superscript> s
<subscript>r
</subscript>i a
<superscript>r
<subscript>i
</subscript></superscript> mod
3251 <para>respectively, for r
<subscript>0</subscript>,..., r
<subscript>k-
1</subscript>
3252 each in the (inclusive) range [
0,...,t-
1].
</para>
3254 <para>It should be noted that the above functions cannot be
3255 decomposed as per a ranged hash composed of hash and range hashing.
</para>
3260 <section xml:
id=
"details.hash_policies.implementation">
3261 <info><title>Implementation
</title></info>
3263 <para>This sub-subsection describes the implementation of
3264 the above in this library. It first explains range-hashing
3265 functions in collision-chaining tables, then ranged-hash
3266 functions in collision-chaining tables, then probing-based
3267 tables, and finally lists the relevant classes in this
3270 <section xml:
id=
"hash_policies.implementation.collision-chaining">
3272 Range-Hashing and Ranged-Hashes in Collision-Chaining Tables
3276 <para><classname>cc_hash_table
</classname> is
3277 parametrized by
<classname>Hash_Fn
</classname> and
<classname>Comb_Hash_Fn
</classname>, a
3278 hash functor and a combining hash functor, respectively.
</para>
3280 <para>In general,
<classname>Comb_Hash_Fn
</classname> is considered a
3281 range-hashing functor.
<classname>cc_hash_table
</classname>
3282 synthesizes a ranged-hash function from
<classname>Hash_Fn
</classname> and
3283 <classname>Comb_Hash_Fn
</classname>. The figure below shows an
<classname>insert
</classname> sequence
3284 diagram for this case. The user inserts an element (point A),
3285 the container transforms the key into a non-negative integral
3286 using the hash functor (points B and C), and transforms the
3287 result into a position using the combining functor (points D
3291 <title>Insert hash sequence diagram
</title>
3294 <imagedata align=
"center" format=
"PNG" scale=
"100"
3295 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_range_hashing_seq_diagram.png"/>
3298 <phrase>Insert hash sequence diagram
</phrase>
3303 <para>If
<classname>cc_hash_table
</classname>'s
3304 hash-functor,
<classname>Hash_Fn
</classname> is instantiated by
<classname>null_type
</classname> , then
<classname>Comb_Hash_Fn
</classname> is taken to be
3305 a ranged-hash function. The graphic below shows an
<function>insert
</function> sequence
3306 diagram. The user inserts an element (point A), the container
3307 transforms the key into a position using the combining functor
3308 (points B and C).
</para>
3311 <title>Insert hash sequence diagram with a null policy
</title>
3314 <imagedata align=
"center" format=
"PNG" scale=
"100"
3315 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_range_hashing_seq_diagram2.png"/>
3318 <phrase>Insert hash sequence diagram with a null policy
</phrase>
3325 <section xml:
id=
"hash_policies.implementation.probe">
3329 <para><classname>gp_hash_table
</classname> is parametrized by
3330 <classname>Hash_Fn
</classname>,
<classname>Probe_Fn
</classname>,
3331 and
<classname>Comb_Probe_Fn
</classname>. As before, if
3332 <classname>Hash_Fn
</classname> and
<classname>Probe_Fn
</classname>
3333 are both
<classname>null_type
</classname>, then
3334 <classname>Comb_Probe_Fn
</classname> is a ranged-probe
3335 functor. Otherwise,
<classname>Hash_Fn
</classname> is a hash
3336 functor,
<classname>Probe_Fn
</classname> is a functor for offsets
3337 from a hash value, and
<classname>Comb_Probe_Fn
</classname>
3338 transforms a probe sequence into a sequence of positions within
3343 <section xml:
id=
"hash_policies.implementation.predefined">
3345 Pre-Defined Policies
3348 <para>This library contains some pre-defined classes
3349 implementing range-hashing and probing functions:
</para>
3352 <listitem><para><classname>direct_mask_range_hashing
</classname>
3353 and
<classname>direct_mod_range_hashing
</classname>
3354 are range-hashing functions based on a bit-mask and a modulo
3355 operation, respectively.
</para></listitem>
3357 <listitem><para><classname>linear_probe_fn
</classname>, and
3358 <classname>quadratic_probe_fn
</classname> are
3359 a linear probe and a quadratic probe function,
3360 respectively.
</para></listitem>
3364 The graphic below shows the relationships.
3367 <title>Hash policy class diagram
</title>
3370 <imagedata align=
"center" format=
"PNG" scale=
"100"
3371 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_policy_cd.png"/>
3374 <phrase>Hash policy class diagram
</phrase>
3382 </section> <!-- impl -->
3386 <section xml:
id=
"container.hash.details.resize_policies">
3387 <info><title>Resize Policies
</title></info>
3389 <section xml:
id=
"resize_policies.general">
3390 <info><title>General
</title></info>
3392 <para>Hash-tables, as opposed to trees, do not naturally grow or
3393 shrink. It is necessary to specify policies to determine how
3394 and when a hash table should change its size. Usually, resize
3395 policies can be decomposed into orthogonal policies:
</para>
3398 <listitem><para>A size policy indicating how a hash table
3399 should grow (e.g., it should multiply by powers of
3400 2).
</para></listitem>
3402 <listitem><para>A trigger policy indicating when a hash
3403 table should grow (e.g., a load factor is
3404 exceeded).
</para></listitem>
3409 <section xml:
id=
"resize_policies.size">
3410 <info><title>Size Policies
</title></info>
3413 <para>Size policies determine how a hash table changes size. These
3414 policies are simple, and there are relatively few sensible
3415 options. An exponential-size policy (with the initial size and
3416 growth factors both powers of
2) works well with a mask-based
3417 range-hashing function, and is the
3418 hard-wired policy used by Dinkumware. A
3419 prime-list based policy works well with a modulo-prime range
3420 hashing function and is the hard-wired policy used by SGI's
3421 implementation.
</para>
3425 <section xml:
id=
"resize_policies.trigger">
3426 <info><title>Trigger Policies
</title></info>
3428 <para>Trigger policies determine when a hash table changes size.
3429 Following is a description of two policies: load-check
3430 policies, and collision-check policies.
</para>
3432 <para>Load-check policies are straightforward. The user specifies
3433 two factors, Α
<subscript>min
</subscript> and
3434 Α
<subscript>max
</subscript>, and the hash table maintains the
3435 invariant that
</para>
3437 <para>Α
<subscript>min
</subscript> ≤ (number of
3438 stored elements) / (hash-table size) ≤
3439 Α
<subscript>max
</subscript><remark>load factor min max
</remark></para>
3441 <para>Collision-check policies work in the opposite direction of
3442 load-check policies. They focus on keeping the number of
3443 collisions moderate and hoping that the size of the table will
3444 not grow very large, instead of keeping a moderate load-factor
3445 and hoping that the number of collisions will be small. A
3446 maximal collision-check policy resizes when the longest
3447 probe-sequence grows too large.
</para>
3449 <para>Consider the graphic below. Let the size of the hash table
3450 be denoted by m, the length of a probe sequence be denoted by k,
3451 and some load factor be denoted by Α. We would like to
3452 calculate the minimal length of k, such that if there were Α
3453 m elements in the hash table, a probe sequence of length k would
3454 be found with probability at most
1/m.
</para>
3457 <title>Balls and bins
</title>
3460 <imagedata align=
"center" format=
"PNG" scale=
"100"
3461 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_balls_and_bins.png"/>
3464 <phrase>Balls and bins
</phrase>
3469 <para>Denote the probability that a probe sequence of length
3470 k appears in bin i by p
<subscript>i
</subscript>, the
3471 length of the probe sequence of bin i by
3472 l
<subscript>i
</subscript>, and assume uniform distribution. Then
</para>
3478 Probability of Probe Sequence of Length k
3481 p
<subscript>1</subscript> =
3485 <para>P(l
<subscript>1</subscript> ≥ k) =
</para>
3488 P(l
<subscript>1</subscript> ≥ α (
1 + k / α -
1) ≤ (a)
3492 e ^ ( - ( α ( k / α -
1 )
<superscript>2</superscript> ) /
2)
3495 <para>where (a) follows from the Chernoff bound (
<xref linkend=
"biblio.motwani95random"/>). To
3496 calculate the probability that some bin contains a probe
3497 sequence greater than k, we note that the
3498 l
<subscript>i
</subscript> are negatively-dependent
3499 (
<xref linkend=
"biblio.dubhashi98neg"/>)
3501 I(.) denote the indicator function. Then
</para>
3505 Probability Probe Sequence in Some Bin
3508 P( exists
<subscript>i
</subscript> l
<subscript>i
</subscript> ≥ k ) =
3512 <para>P ( ∑
<subscript>i =
1</subscript><superscript>m
</superscript>
3513 I(l
<subscript>i
</subscript> ≥ k) ≥
1 ) =
</para>
3515 <para>P ( ∑
<subscript>i =
1</subscript><superscript>m
</superscript> I (
3516 l
<subscript>i
</subscript> ≥ k ) ≥ m p
<subscript>1</subscript> (
1 +
1 / (m
3517 p
<subscript>1</subscript>) -
1 ) ) ≤ (a)
</para>
3519 <para>e ^ ( ( - m p
<subscript>1</subscript> (
1 / (m p
<subscript>1</subscript>)
3520 -
1 )
<superscript>2</superscript> ) /
2 ) ,
</para>
3522 <para>where (a) follows from the fact that the Chernoff bound can
3523 be applied to negatively-dependent variables (
<xref
3524 linkend=
"biblio.dubhashi98neg"/>). Inserting the first probability
3525 equation into the second one, and equating with
1/m, we
3529 <para>k ~ √ (
2 α ln
2 m ln(m) )
3534 <section xml:
id=
"resize_policies.impl">
3535 <info><title>Implementation
</title></info>
3537 <para>This sub-subsection describes the implementation of the
3538 above in this library. It first describes resize policies and
3539 their decomposition into trigger and size policies, then
3540 describes pre-defined classes, and finally discusses controlled
3541 access the policies' internals.
</para>
3543 <section xml:
id=
"resize_policies.impl.decomposition">
3544 <info><title>Decomposition
</title></info>
3547 <para>Each hash-based container is parametrized by a
3548 <classname>Resize_Policy
</classname> parameter; the container derives
3549 <classname>public
</classname>ly from
<classname>Resize_Policy
</classname>. For
3552 cc_hash_table
<typename Key,
3555 typename Resize_Policy
3556 ...
> : public Resize_Policy
3559 <para>As a container object is modified, it continuously notifies
3560 its
<classname>Resize_Policy
</classname> base of internal changes
3561 (e.g., collisions encountered and elements being
3562 inserted). It queries its
<classname>Resize_Policy
</classname> base whether
3563 it needs to be resized, and if so, to what size.
</para>
3565 <para>The graphic below shows a (possible) sequence diagram
3566 of an insert operation. The user inserts an element; the hash
3567 table notifies its resize policy that a search has started
3568 (point A); in this case, a single collision is encountered -
3569 the table notifies its resize policy of this (point B); the
3570 container finally notifies its resize policy that the search
3571 has ended (point C); it then queries its resize policy whether
3572 a resize is needed, and if so, what is the new size (points D
3573 to G); following the resize, it notifies the policy that a
3574 resize has completed (point H); finally, the element is
3575 inserted, and the policy notified (point I).
</para>
3578 <title>Insert resize sequence diagram
</title>
3581 <imagedata align=
"center" format=
"PNG" scale=
"100"
3582 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_insert_resize_sequence_diagram1.png"/>
3585 <phrase>Insert resize sequence diagram
</phrase>
3591 <para>In practice, a resize policy can be usually orthogonally
3592 decomposed to a size policy and a trigger policy. Consequently,
3593 the library contains a single class for instantiating a resize
3594 policy:
<classname>hash_standard_resize_policy
</classname>
3595 is parametrized by
<classname>Size_Policy
</classname> and
3596 <classname>Trigger_Policy
</classname>, derives
<classname>public
</classname>ly from
3597 both, and acts as a standard delegate (
<xref linkend=
"biblio.gof"/>)
3598 to these policies.
</para>
3600 <para>The two graphics immediately below show sequence diagrams
3601 illustrating the interaction between the standard resize policy
3602 and its trigger and size policies, respectively.
</para>
3605 <title>Standard resize policy trigger sequence
3609 <imagedata align=
"center" format=
"PNG" scale=
"100"
3610 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_insert_resize_sequence_diagram2.png"/>
3613 <phrase>Standard resize policy trigger sequence
3620 <title>Standard resize policy size sequence
3624 <imagedata align=
"center" format=
"PNG" scale=
"100"
3625 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_insert_resize_sequence_diagram3.png"/>
3628 <phrase>Standard resize policy size sequence
3637 <section xml:
id=
"resize_policies.impl.predefined">
3638 <info><title>Predefined Policies
</title></info>
3639 <para>The library includes the following
3640 instantiations of size and trigger policies:
</para>
3643 <listitem><para><classname>hash_load_check_resize_trigger
</classname>
3644 implements a load check trigger policy.
</para></listitem>
3646 <listitem><para><classname>cc_hash_max_collision_check_resize_trigger
</classname>
3647 implements a collision check trigger policy.
</para></listitem>
3649 <listitem><para><classname>hash_exponential_size_policy
</classname>
3650 implements an exponential-size policy (which should be used
3651 with mask range hashing).
</para></listitem>
3653 <listitem><para><classname>hash_prime_size_policy
</classname>
3654 implementing a size policy based on a sequence of primes
3656 be used with mod range hashing
</para></listitem>
3659 <para>The graphic below gives an overall picture of the resize-related
3660 classes.
<classname>basic_hash_table
</classname>
3661 is parametrized by
<classname>Resize_Policy
</classname>, which it subclasses
3662 publicly. This class is currently instantiated only by
<classname>hash_standard_resize_policy
</classname>.
3663 <classname>hash_standard_resize_policy
</classname>
3664 itself is parametrized by
<classname>Trigger_Policy
</classname> and
3665 <classname>Size_Policy
</classname>. Currently,
<classname>Trigger_Policy
</classname> is
3666 instantiated by
<classname>hash_load_check_resize_trigger
</classname>,
3667 or
<classname>cc_hash_max_collision_check_resize_trigger
</classname>;
3668 <classname>Size_Policy
</classname> is instantiated by
<classname>hash_exponential_size_policy
</classname>,
3669 or
<classname>hash_prime_size_policy
</classname>.
</para>
3673 <section xml:
id=
"resize_policies.impl.internals">
3674 <info><title>Controling Access to Internals
</title></info>
3676 <para>There are cases where (controlled) access to resize
3677 policies' internals is beneficial. E.g., it is sometimes
3678 useful to query a hash-table for the table's actual size (as
3679 opposed to its
<function>size()
</function> - the number of values it
3680 currently holds); it is sometimes useful to set a table's
3681 initial size, externally resize it, or change load factors.
</para>
3683 <para>Clearly, supporting such methods both decreases the
3684 encapsulation of hash-based containers, and increases the
3685 diversity between different associative-containers' interfaces.
3686 Conversely, omitting such methods can decrease containers'
3689 <para>In order to avoid, to the extent possible, the above
3690 conflict, the hash-based containers themselves do not address
3691 any of these questions; this is deferred to the resize policies,
3692 which are easier to change or replace. Thus, for example,
3693 neither
<classname>cc_hash_table
</classname> nor
3694 <classname>gp_hash_table
</classname>
3695 contain methods for querying the actual size of the table; this
3696 is deferred to
<classname>hash_standard_resize_policy
</classname>.
</para>
3698 <para>Furthermore, the policies themselves are parametrized by
3699 template arguments that determine the methods they support
3701 <xref linkend=
"biblio.alexandrescu01modern"/>
3702 shows techniques for doing so).
<classname>hash_standard_resize_policy
</classname>
3703 is parametrized by
<classname>External_Size_Access
</classname> that
3704 determines whether it supports methods for querying the actual
3705 size of the table or resizing it.
<classname>hash_load_check_resize_trigger
</classname>
3706 is parametrized by
<classname>External_Load_Access
</classname> that
3707 determines whether it supports methods for querying or
3708 modifying the loads.
<classname>cc_hash_max_collision_check_resize_trigger
</classname>
3709 is parametrized by
<classname>External_Load_Access
</classname> that
3710 determines whether it supports methods for querying the
3713 <para>Some operations, for example, resizing a container at
3714 run time, or changing the load factors of a load-check trigger
3715 policy, require the container itself to resize. As mentioned
3716 above, the hash-based containers themselves do not contain
3717 these types of methods, only their resize policies.
3718 Consequently, there must be some mechanism for a resize policy
3719 to manipulate the hash-based container. As the hash-based
3720 container is a subclass of the resize policy, this is done
3721 through virtual methods. Each hash-based container has a
3722 <classname>private
</classname> <classname>virtual
</classname> method:
</para>
3726 (size_type new_size);
3729 <para>which resizes the container. Implementations of
3730 <classname>Resize_Policy
</classname> can export public methods for resizing
3731 the container externally; these methods internally call
3732 <classname>do_resize
</classname> to resize the table.
</para>
3740 </section> <!-- resize policies -->
3742 <section xml:
id=
"container.hash.details.policy_interaction">
3743 <info><title>Policy Interactions
</title></info>
3746 <para>Hash-tables are unfortunately especially susceptible to
3747 choice of policies. One of the more complicated aspects of this
3748 is that poor combinations of good policies can form a poor
3749 container. Following are some considerations.
</para>
3751 <section xml:
id=
"policy_interaction.probesizetrigger">
3752 <info><title>probe/size/trigger
</title></info>
3754 <para>Some combinations do not work well for probing containers.
3755 For example, combining a quadratic probe policy with an
3756 exponential size policy can yield a poor container: when an
3757 element is inserted, a trigger policy might decide that there
3758 is no need to resize, as the table still contains unused
3759 entries; the probe sequence, however, might never reach any of
3760 the unused entries.
</para>
3762 <para>Unfortunately, this library cannot detect such problems at
3763 compilation (they are halting reducible). It therefore defines
3764 an exception class
<classname>insert_error
</classname> to throw an
3765 exception in this case.
</para>
3769 <section xml:
id=
"policy_interaction.hashtrigger">
3770 <info><title>hash/trigger
</title></info>
3772 <para>Some trigger policies are especially susceptible to poor
3773 hash functions. Suppose, as an extreme case, that the hash
3774 function transforms each key to the same hash value. After some
3775 inserts, a collision detecting policy will always indicate that
3776 the container needs to grow.
</para>
3778 <para>The library, therefore, by design, limits each operation to
3779 one resize. For each
<classname>insert
</classname>, for example, it queries
3780 only once whether a resize is needed.
</para>
3784 <section xml:
id=
"policy_interaction.eqstorehash">
3785 <info><title>equivalence functors/storing hash values/hash
</title></info>
3787 <para><classname>cc_hash_table
</classname> and
3788 <classname>gp_hash_table
</classname> are
3789 parametrized by an equivalence functor and by a
3790 <classname>Store_Hash
</classname> parameter. If the latter parameter is
3791 <classname>true
</classname>, then the container stores with each entry
3792 a hash value, and uses this value in case of collisions to
3793 determine whether to apply a hash value. This can lower the
3794 cost of collision for some types, but increase the cost of
3795 collisions for other types.
</para>
3797 <para>If a ranged-hash function or ranged probe function is
3798 directly supplied, however, then it makes no sense to store the
3799 hash value with each entry. This library's container will
3800 fail at compilation, by design, if this is attempted.
</para>
3804 <section xml:
id=
"policy_interaction.sizeloadtrigger">
3805 <info><title>size/load-check trigger
</title></info>
3807 <para>Assume a size policy issues an increasing sequence of sizes
3808 a, a q, a q
<superscript>1</superscript>, a q
<superscript>2</superscript>, ... For
3809 example, an exponential size policy might issue the sequence of
3810 sizes
8,
16,
32,
64, ...
</para>
3812 <para>If a load-check trigger policy is used, with loads
3813 α
<subscript>min
</subscript> and α
<subscript>max
</subscript>,
3814 respectively, then it is a good idea to have:
</para>
3817 <listitem><para>α
<subscript>max
</subscript> ~
1 / q
</para></listitem>
3819 <listitem><para>α
<subscript>min
</subscript> < 1 / (
2 q)
</para></listitem>
3822 <para>This will ensure that the amortized hash cost of each
3823 modifying operation is at most approximately
3.
</para>
3825 <para>α
<subscript>min
</subscript> ~ α
<subscript>max
</subscript> is, in
3826 any case, a bad choice, and α
<subscript>min
</subscript> >
3827 α
<subscript>max
</subscript> is horrendous.
</para>
3833 </section> <!-- details -->
3835 </section> <!-- hash -->
3838 <section xml:
id=
"pbds.design.container.tree">
3839 <info><title>tree
</title></info>
3841 <section xml:
id=
"container.tree.interface">
3842 <info><title>Interface
</title></info>
3844 <para>The tree-based container has the following declaration:
</para>
3849 typename Cmp_Fn = std::less
<Key
>,
3850 typename Tag = rb_tree_tag,
3852 typename Const_Node_Iterator,
3853 typename Node_Iterator,
3855 typename Allocator_
>
3856 class Node_Update = null_node_update,
3857 typename Allocator = std::allocator
<char
> >
3861 <para>The parameters have the following meaning:
</para>
3865 <para><classname>Key
</classname> is the key type.
</para></listitem>
3868 <para><classname>Mapped
</classname> is the mapped-policy.
</para></listitem>
3871 <para><classname>Cmp_Fn
</classname> is a key comparison functor
</para></listitem>
3874 <para><classname>Tag
</classname> specifies which underlying data structure
3875 to use.
</para></listitem>
3878 <para><classname>Node_Update
</classname> is a policy for updating node
3879 invariants.
</para></listitem>
3882 <para><classname>Allocator
</classname> is an allocator
3883 type.
</para></listitem>
3886 <para>The
<classname>Tag
</classname> parameter specifies which underlying
3887 data structure to use. Instantiating it by
<classname>rb_tree_tag
</classname>,
<classname>splay_tree_tag
</classname>, or
3888 <classname>ov_tree_tag
</classname>,
3889 specifies an underlying red-black tree, splay tree, or
3890 ordered-vector tree, respectively; any other tag is illegal.
3891 Note that containers based on the former two contain more types
3892 and methods than the latter (e.g.,
3893 <classname>reverse_iterator
</classname> and
<classname>rbegin
</classname>), and different
3894 exception and invalidation guarantees.
</para>
3898 <section xml:
id=
"container.tree.details">
3899 <info><title>Details
</title></info>
3901 <section xml:
id=
"container.tree.node">
3902 <info><title>Node Invariants
</title></info>
3905 <para>Consider the two trees in the graphic below, labels A and B. The first
3906 is a tree of floats; the second is a tree of pairs, each
3907 signifying a geometric line interval. Each element in a tree is refered to as a node of the tree. Of course, each of
3908 these trees can support the usual queries: the first can easily
3909 search for
<classname>0.4</classname>; the second can easily search for
3910 <classname>std::make_pair(
10,
41)
</classname>.
</para>
3912 <para>Each of these trees can efficiently support other queries.
3913 The first can efficiently determine that the
2rd key in the
3914 tree is
<constant>0.3</constant>; the second can efficiently determine
3915 whether any of its intervals overlaps
3916 <programlisting>std::make_pair(
29,
42)
</programlisting> (useful in geometric
3917 applications or distributed file systems with leases, for
3918 example). It should be noted that an
<classname>std::set
</classname> can
3919 only solve these types of problems with linear complexity.
</para>
3921 <para>In order to do so, each tree stores some metadata in
3922 each node, and maintains node invariants (see
<xref linkend=
"biblio.clrs2001"/>.) The first stores in
3923 each node the size of the sub-tree rooted at the node; the
3924 second stores at each node the maximal endpoint of the
3925 intervals at the sub-tree rooted at the node.
</para>
3928 <title>Tree node invariants
</title>
3931 <imagedata align=
"center" format=
"PNG" scale=
"100"
3932 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_tree_node_invariants.png"/>
3935 <phrase>Tree node invariants
</phrase>
3940 <para>Supporting such trees is difficult for a number of
3944 <listitem><para>There must be a way to specify what a node's metadata
3945 should be (if any).
</para></listitem>
3947 <listitem><para>Various operations can invalidate node
3948 invariants. The graphic below shows how a right rotation,
3949 performed on A, results in B, with nodes x and y having
3950 corrupted invariants (the grayed nodes in C). The graphic shows
3951 how an insert, performed on D, results in E, with nodes x and y
3952 having corrupted invariants (the grayed nodes in F). It is not
3953 feasible to know outside the tree the effect of an operation on
3954 the nodes of the tree.
</para></listitem>
3956 <listitem><para>The search paths of standard associative containers are
3957 defined by comparisons between keys, and not through
3958 metadata.
</para></listitem>
3960 <listitem><para>It is not feasible to know in advance which methods trees
3961 can support. Besides the usual
<classname>find
</classname> method, the
3962 first tree can support a
<classname>find_by_order
</classname> method, while
3963 the second can support an
<classname>overlaps
</classname> method.
</para></listitem>
3967 <title>Tree node invalidation
</title>
3970 <imagedata align=
"center" format=
"PNG" scale=
"100"
3971 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_tree_node_invalidations.png"/>
3974 <phrase>Tree node invalidation
</phrase>
3979 <para>These problems are solved by a combination of two means:
3980 node iterators, and template-template node updater
3983 <section xml:
id=
"container.tree.node.iterators">
3984 <info><title>Node Iterators
</title></info>
3987 <para>Each tree-based container defines two additional iterator
3988 types,
<classname>const_node_iterator
</classname>
3989 and
<classname>node_iterator
</classname>.
3990 These iterators allow descending from a node to one of its
3991 children. Node iterator allow search paths different than those
3992 determined by the comparison functor. The
<classname>tree
</classname>
3993 supports the methods:
</para>
4008 <para>The first pairs return node iterators corresponding to the
4009 root node of the tree; the latter pair returns node iterators
4010 corresponding to a just-after-leaf node.
</para>
4013 <section xml:
id=
"container.tree.node.updator">
4014 <info><title>Node Updator
</title></info>
4016 <para>The tree-based containers are parametrized by a
4017 <classname>Node_Update
</classname> template-template parameter. A
4018 tree-based container instantiates
4019 <classname>Node_Update
</classname> to some
4020 <classname>node_update
</classname> class, and publicly subclasses
4021 <classname>node_update
</classname>. The graphic below shows this
4022 scheme, as well as some predefined policies (which are explained
4026 <title>A tree and its update policy
</title>
4029 <imagedata align=
"center" format=
"PNG" scale=
"100"
4030 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_tree_node_updator_policy_cd.png"/>
4033 <phrase>A tree and its update policy
</phrase>
4038 <para><classname>node_update
</classname> (an instantiation of
4039 <classname>Node_Update
</classname>) must define
<classname>metadata_type
</classname> as
4040 the type of metadata it requires. For order statistics,
4041 e.g.,
<classname>metadata_type
</classname> might be
<classname>size_t
</classname>.
4042 The tree defines within each node a
<classname>metadata_type
</classname>
4045 <para><classname>node_update
</classname> must also define the following method
4046 for restoring node invariants:
</para>
4049 operator()(node_iterator nd_it, const_node_iterator end_nd_it)
4052 <para>In this method,
<varname>nd_it
</varname> is a
4053 <classname>node_iterator
</classname> corresponding to a node whose
4054 A) all descendants have valid invariants, and B) its own
4055 invariants might be violated;
<classname>end_nd_it
</classname> is
4056 a
<classname>const_node_iterator
</classname> corresponding to a
4057 just-after-leaf node. This method should correct the node
4058 invariants of the node pointed to by
4059 <classname>nd_it
</classname>. For example, say node x in the
4060 graphic below label A has an invalid invariant, but its' children,
4061 y and z have valid invariants. After the invocation, all three
4062 nodes should have valid invariants, as in label B.
</para>
4066 <title>Restoring node invariants
</title>
4069 <imagedata align=
"center" format=
"PNG" scale=
"100"
4070 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_restoring_node_invariants.png"/>
4073 <phrase>Restoring node invariants
</phrase>
4078 <para>When a tree operation might invalidate some node invariant,
4079 it invokes this method in its
<classname>node_update
</classname> base to
4080 restore the invariant. For example, the graphic below shows
4081 an
<function>insert
</function> operation (point A); the tree performs some
4082 operations, and calls the update functor three times (points B,
4083 C, and D). (It is well known that any
<function>insert
</function>,
4084 <function>erase
</function>,
<function>split
</function> or
<function>join
</function>, can restore
4085 all node invariants by a small number of node invariant updates (
<xref linkend=
"biblio.clrs2001"/>)
4089 <title>Insert update sequence
</title>
4092 <imagedata align=
"center" format=
"PNG" scale=
"100"
4093 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_update_seq_diagram.png"/>
4096 <phrase>Insert update sequence
</phrase>
4101 <para>To complete the description of the scheme, three questions
4102 need to be answered:
</para>
4105 <listitem><para>How can a tree which supports order statistics define a
4106 method such as
<classname>find_by_order
</classname>?
</para></listitem>
4108 <listitem><para>How can the node updater base access methods of the
4109 tree?
</para></listitem>
4111 <listitem><para>How can the following cyclic dependency be resolved?
4112 <classname>node_update
</classname> is a base class of the tree, yet it
4113 uses node iterators defined in the tree (its child).
</para></listitem>
4116 <para>The first two questions are answered by the fact that
4117 <classname>node_update
</classname> (an instantiation of
4118 <classname>Node_Update
</classname>) is a
<emphasis>public
</emphasis> base class
4119 of the tree. Consequently:
</para>
4122 <listitem><para>Any public methods of
4123 <classname>node_update
</classname> are automatically methods of
4124 the tree (
<xref linkend=
"biblio.alexandrescu01modern"/>).
4125 Thus an order-statistics node updater,
4126 <classname>tree_order_statistics_node_update
</classname> defines
4127 the
<function>find_by_order
</function> method; any tree
4128 instantiated by this policy consequently supports this method as
4129 well.
</para></listitem>
4131 <listitem><para>In C++, if a base class declares a method as
4132 <literal>virtual
</literal>, it is
4133 <literal>virtual
</literal> in its subclasses. If
4134 <classname>node_update
</classname> needs to access one of the
4135 tree's methods, say the member function
4136 <function>end
</function>, it simply declares that method as
4137 <literal>virtual
</literal> abstract.
</para></listitem>
4140 <para>The cyclic dependency is solved through template-template
4141 parameters.
<classname>Node_Update
</classname> is parametrized by
4142 the tree's node iterators, its comparison functor, and its
4143 allocator type. Thus, instantiations of
4144 <classname>Node_Update
</classname> have all information
4147 <para>This library assumes that constructing a metadata object and
4148 modifying it are exception free. Suppose that during some method,
4149 say
<classname>insert
</classname>, a metadata-related operation
4150 (e.g., changing the value of a metadata) throws an exception. Ack!
4151 Rolling back the method is unusually complex.
</para>
4153 <para>Previously, a distinction was made between redundant
4154 policies and null policies. Node invariants show a
4155 case where null policies are required.
</para>
4157 <para>Assume a regular tree is required, one which need not
4158 support order statistics or interval overlap queries.
4159 Seemingly, in this case a redundant policy - a policy which
4160 doesn't affect nodes' contents would suffice. This, would lead
4161 to the following drawbacks:
</para>
4164 <listitem><para>Each node would carry a useless metadata object, wasting
4165 space.
</para></listitem>
4167 <listitem><para>The tree cannot know if its
4168 <classname>Node_Update
</classname> policy actually modifies a
4169 node's metadata (this is halting reducible). In the graphic
4170 below, assume the shaded node is inserted. The tree would have
4171 to traverse the useless path shown to the root, applying
4172 redundant updates all the way.
</para></listitem>
4175 <title>Useless update path
</title>
4178 <imagedata align=
"center" format=
"PNG" scale=
"100"
4179 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_rationale_null_node_updator.png"/>
4182 <phrase>Useless update path
</phrase>
4188 <para>A null policy class,
<classname>null_node_update
</classname>
4189 solves both these problems. The tree detects that node
4190 invariants are irrelevant, and defines all accordingly.
</para>
4196 <section xml:
id=
"container.tree.details.split">
4197 <info><title>Split and Join
</title></info>
4199 <para>Tree-based containers support split and join methods.
4200 It is possible to split a tree so that it passes
4201 all nodes with keys larger than a given key to a different
4202 tree. These methods have the following advantages over the
4203 alternative of externally inserting to the destination
4204 tree and erasing from the source tree:
</para>
4207 <listitem><para>These methods are efficient - red-black trees are split
4208 and joined in poly-logarithmic complexity; ordered-vector
4209 trees are split and joined at linear complexity. The
4210 alternatives have super-linear complexity.
</para></listitem>
4212 <listitem><para>Aside from orders of growth, these operations perform
4213 few allocations and de-allocations. For red-black trees, allocations are not performed,
4214 and the methods are exception-free.
</para></listitem>
4218 </section> <!-- details -->
4220 </section> <!-- tree -->
4223 <section xml:
id=
"pbds.design.container.trie">
4224 <info><title>Trie
</title></info>
4226 <section xml:
id=
"container.trie.interface">
4227 <info><title>Interface
</title></info>
4229 <para>The trie-based container has the following declaration:
</para>
4231 template
<typename Key,
4233 typename Cmp_Fn = std::less
<Key
>,
4234 typename Tag = pat_trie_tag,
4235 template
<typename Const_Node_Iterator,
4236 typename Node_Iterator,
4237 typename E_Access_Traits_,
4238 typename Allocator_
>
4239 class Node_Update = null_node_update,
4240 typename Allocator = std::allocator
<char
> >
4244 <para>The parameters have the following meaning:
</para>
4247 <listitem><para><classname>Key
</classname> is the key type.
</para></listitem>
4249 <listitem><para><classname>Mapped
</classname> is the mapped-policy.
</para></listitem>
4251 <listitem><para><classname>E_Access_Traits
</classname> is described in below.
</para></listitem>
4253 <listitem><para><classname>Tag
</classname> specifies which underlying data structure
4254 to use, and is described shortly.
</para></listitem>
4256 <listitem><para><classname>Node_Update
</classname> is a policy for updating node
4257 invariants. This is described below.
</para></listitem>
4259 <listitem><para><classname>Allocator
</classname> is an allocator
4260 type.
</para></listitem>
4263 <para>The
<classname>Tag
</classname> parameter specifies which underlying
4264 data structure to use. Instantiating it by
<classname>pat_trie_tag
</classname>, specifies an
4265 underlying PATRICIA trie (explained shortly); any other tag is
4266 currently illegal.
</para>
4268 <para>Following is a description of a (PATRICIA) trie
4269 (this implementation follows
<xref linkend=
"biblio.okasaki98mereable"/> and
4270 <xref linkend=
"biblio.filliatre2000ptset"/>).
4273 <para>A (PATRICIA) trie is similar to a tree, but with the
4274 following differences:
</para>
4277 <listitem><para>It explicitly views keys as a sequence of elements.
4278 E.g., a trie can view a string as a sequence of
4279 characters; a trie can view a number as a sequence of
4280 bits.
</para></listitem>
4282 <listitem><para>It is not (necessarily) binary. Each node has fan-out n
4283 +
1, where n is the number of distinct
4284 elements.
</para></listitem>
4286 <listitem><para>It stores values only at leaf nodes.
</para></listitem>
4288 <listitem><para>Internal nodes have the properties that A) each has at
4289 least two children, and B) each shares the same prefix with
4290 any of its descendant.
</para></listitem>
4293 <para>A (PATRICIA) trie has some useful properties:
</para>
4296 <listitem><para>It can be configured to use large node fan-out, giving it
4297 very efficient find performance (albeit at insertion
4298 complexity and size).
</para></listitem>
4300 <listitem><para>It works well for common-prefix keys.
</para></listitem>
4302 <listitem><para>It can support efficiently queries such as which
4303 keys match a certain prefix. This is sometimes useful in file
4304 systems and routers, and for
"type-ahead" aka predictive text matching
4305 on mobile devices.
</para></listitem>
4311 <section xml:
id=
"container.trie.details">
4312 <info><title>Details
</title></info>
4314 <section xml:
id=
"container.trie.details.etraits">
4315 <info><title>Element Access Traits
</title></info>
4317 <para>A trie inherently views its keys as sequences of elements.
4318 For example, a trie can view a string as a sequence of
4319 characters. A trie needs to map each of n elements to a
4320 number in {
0, n -
1}. For example, a trie can map a
4321 character
<varname>c
</varname> to
4322 <programlisting>static_cast
<size_t
>(c)
</programlisting>.
</para>
4324 <para>Seemingly, then, a trie can assume that its keys support
4325 (const) iterators, and that the
<classname>value_type
</classname> of this
4326 iterator can be cast to a
<classname>size_t
</classname>. There are several
4327 reasons, though, to decouple the mechanism by which the trie
4328 accesses its keys' elements from the trie:
</para>
4331 <listitem><para>In some cases, the numerical value of an element is
4332 inappropriate. Consider a trie storing DNA strings. It is
4333 logical to use a trie with a fan-out of
5 =
1 + |{'A', 'C',
4334 'G', 'T'}|. This requires mapping 'T' to
3, though.
</para></listitem>
4336 <listitem><para>In some cases the keys' iterators are different than what
4337 is needed. For example, a trie can be used to search for
4338 common suffixes, by using strings'
4339 <classname>reverse_iterator
</classname>. As another example, a trie mapping
4340 UNICODE strings would have a huge fan-out if each node would
4341 branch on a UNICODE character; instead, one can define an
4342 iterator iterating over
8-bit (or less) groups.
</para></listitem>
4346 consequently, parametrized by
<classname>E_Access_Traits
</classname> -
4347 traits which instruct how to access sequences' elements.
4348 <classname>string_trie_e_access_traits
</classname>
4349 is a traits class for strings. Each such traits define some
4352 typename E_Access_Traits::const_iterator
4355 <para>is a const iterator iterating over a key's elements. The
4356 traits class must also define methods for obtaining an iterator
4357 to the first and last element of a key.
</para>
4359 <para>The graphic below shows a
4360 (PATRICIA) trie resulting from inserting the words:
"I wish
4361 that I could ever see a poem lovely as a trie" (which,
4362 unfortunately, does not rhyme).
</para>
4364 <para>The leaf nodes contain values; each internal node contains
4365 two
<classname>typename E_Access_Traits::const_iterator
</classname>
4366 objects, indicating the maximal common prefix of all keys in
4367 the sub-tree. For example, the shaded internal node roots a
4368 sub-tree with leafs
"a" and
"as". The maximal common prefix is
4369 "a". The internal node contains, consequently, to const
4370 iterators, one pointing to
<varname>'a'
</varname>, and the other to
4371 <varname>'s'
</varname>.
</para>
4374 <title>A PATRICIA trie
</title>
4377 <imagedata align=
"center" format=
"PNG" scale=
"100"
4378 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_pat_trie.png"/>
4381 <phrase>A PATRICIA trie
</phrase>
4388 <section xml:
id=
"container.trie.details.node">
4389 <info><title>Node Invariants
</title></info>
4391 <para>Trie-based containers support node invariants, as do
4392 tree-based containers. There are two minor
4393 differences, though, which, unfortunately, thwart sharing them
4394 sharing the same node-updating policies:
</para>
4398 <para>A trie's
<classname>Node_Update
</classname> template-template
4399 parameter is parametrized by
<classname>E_Access_Traits
</classname>, while
4400 a tree's
<classname>Node_Update
</classname> template-template parameter is
4401 parametrized by
<classname>Cmp_Fn
</classname>.
</para></listitem>
4403 <listitem><para>Tree-based containers store values in all nodes, while
4404 trie-based containers (at least in this implementation) store
4405 values in leafs.
</para></listitem>
4408 <para>The graphic below shows the scheme, as well as some predefined
4409 policies (which are explained below).
</para>
4412 <title>A trie and its update policy
</title>
4415 <imagedata align=
"center" format=
"PNG" scale=
"100"
4416 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_trie_node_updator_policy_cd.png"/>
4419 <phrase>A trie and its update policy
</phrase>
4425 <para>This library offers the following pre-defined trie node
4426 updating policies:
</para>
4431 <classname>trie_order_statistics_node_update
</classname>
4432 supports order statistics.
4436 <listitem><para><classname>trie_prefix_search_node_update
</classname>
4437 supports searching for ranges that match a given prefix.
</para></listitem>
4439 <listitem><para><classname>null_node_update
</classname>
4440 is the null node updater.
</para></listitem>
4445 <section xml:
id=
"container.trie.details.split">
4446 <info><title>Split and Join
</title></info>
4447 <para>Trie-based containers support split and join methods; the
4448 rationale is equal to that of tree-based containers supporting
4449 these methods.
</para>
4452 </section> <!-- details -->
4454 </section> <!-- trie -->
4456 <!-- list_update -->
4457 <section xml:
id=
"pbds.design.container.list">
4458 <info><title>List
</title></info>
4460 <section xml:
id=
"container.list.interface">
4461 <info><title>Interface
</title></info>
4463 <para>The list-based container has the following declaration:
</para>
4465 template
<typename Key,
4467 typename Eq_Fn = std::equal_to
<Key
>,
4468 typename Update_Policy = move_to_front_lu_policy
<>,
4469 typename Allocator = std::allocator
<char
> >
4473 <para>The parameters have the following meaning:
</para>
4478 <classname>Key
</classname> is the key type.
4484 <classname>Mapped
</classname> is the mapped-policy.
4490 <classname>Eq_Fn
</classname> is a key equivalence functor.
4496 <classname>Update_Policy
</classname> is a policy updating positions in
4497 the list based on access patterns. It is described in the
4498 following subsection.
4504 <classname>Allocator
</classname> is an allocator type.
4509 <para>A list-based associative container is a container that
4510 stores elements in a linked-list. It does not order the elements
4511 by any particular order related to the keys. List-based
4512 containers are primarily useful for creating
"multimaps". In fact,
4513 list-based containers are designed in this library expressly for
4514 this purpose.
</para>
4516 <para>List-based containers might also be useful for some rare
4517 cases, where a key is encapsulated to the extent that only
4518 key-equivalence can be tested. Hash-based containers need to know
4519 how to transform a key into a size type, and tree-based containers
4520 need to know if some key is larger than another. List-based
4521 associative containers, conversely, only need to know if two keys
4522 are equivalent.
</para>
4524 <para>Since a list-based associative container does not order
4525 elements by keys, is it possible to order the list in some
4526 useful manner? Remarkably, many on-line competitive
4527 algorithms exist for reordering lists to reflect access
4528 prediction. (See
<xref linkend=
"biblio.motwani95random"/> and
<xref linkend=
"biblio.andrew04mtf"/>).
4533 <section xml:
id=
"container.list.details">
4534 <info><title>Details
</title></info>
4537 <section xml:
id=
"container.list.details.ds">
4538 <info><title>Underlying Data Structure
</title></info>
4540 <para>The graphic below shows a
4541 simple list of integer keys. If we search for the integer
6, we
4542 are paying an overhead: the link with key
6 is only the fifth
4543 link; if it were the first link, it could be accessed
4547 <title>A simple list
</title>
4550 <imagedata align=
"center" format=
"PNG" scale=
"100"
4551 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_simple_list.png"/>
4554 <phrase>A simple list
</phrase>
4559 <para>List-update algorithms reorder lists as elements are
4560 accessed. They try to determine, by the access history, which
4561 keys to move to the front of the list. Some of these algorithms
4562 require adding some metadata alongside each entry.
</para>
4564 <para>For example, in the graphic below label A shows the counter
4565 algorithm. Each node contains both a key and a count metadata
4566 (shown in bold). When an element is accessed (e.g.
6) its count is
4567 incremented, as shown in label B. If the count reaches some
4568 predetermined value, say
10, as shown in label C, the count is set
4569 to
0 and the node is moved to the front of the list, as in label
4574 <title>The counter algorithm
</title>
4577 <imagedata align=
"center" format=
"PNG" scale=
"100"
4578 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_list_update.png"/>
4581 <phrase>The counter algorithm
</phrase>
4589 <section xml:
id=
"container.list.details.policies">
4590 <info><title>Policies
</title></info>
4592 <para>this library allows instantiating lists with policies
4593 implementing any algorithm moving nodes to the front of the
4594 list (policies implementing algorithms interchanging nodes are
4595 unsupported).
</para>
4597 <para>Associative containers based on lists are parametrized by a
4598 <classname>Update_Policy
</classname> parameter. This parameter defines the
4599 type of metadata each node contains, how to create the
4600 metadata, and how to decide, using this metadata, whether to
4601 move a node to the front of the list. A list-based associative
4602 container object derives (publicly) from its update policy.
4605 <para>An instantiation of
<classname>Update_Policy
</classname> must define
4606 internally
<classname>update_metadata
</classname> as the metadata it
4607 requires. Internally, each node of the list contains, besides
4608 the usual key and data, an instance of
<classname>typename
4609 Update_Policy::update_metadata
</classname>.
</para>
4611 <para>An instantiation of
<classname>Update_Policy
</classname> must define
4612 internally two operators:
</para>
4618 operator()(update_metadata
&);
4621 <para>The first is called by the container object, when creating a
4622 new node, to create the node's metadata. The second is called
4623 by the container object, when a node is accessed (
4624 when a find operation's key is equivalent to the key of the
4625 node), to determine whether to move the node to the front of
4629 <para>The library contains two predefined implementations of
4630 list-update policies. The first
4631 is
<classname>lu_counter_policy
</classname>, which implements the
4632 counter algorithm described above. The second is
4633 <classname>lu_move_to_front_policy
</classname>,
4634 which unconditionally move an accessed element to the front of
4635 the list. The latter type is very useful in this library,
4636 since there is no need to associate metadata with each element.
4637 (See
<xref linkend=
"biblio.andrew04mtf"/>
4642 <section xml:
id=
"container.list.details.mapped">
4643 <info><title>Use in Multimaps
</title></info>
4645 <para>In this library, there are no equivalents for the standard's
4646 multimaps and multisets; instead one uses an associative
4647 container mapping primary keys to secondary keys.
</para>
4649 <para>List-based containers are especially useful as associative
4650 containers for secondary keys. In fact, they are implemented
4651 here expressly for this purpose.
</para>
4653 <para>To begin with, these containers use very little per-entry
4654 structure memory overhead, since they can be implemented as
4655 singly-linked lists. (Arrays use even lower per-entry memory
4656 overhead, but they are less flexible in moving around entries,
4657 and have weaker invalidation guarantees).
</para>
4659 <para>More importantly, though, list-based containers use very
4660 little per-container memory overhead. The memory overhead of an
4661 empty list-based container is practically that of a pointer.
4662 This is important for when they are used as secondary
4663 associative-containers in situations where the average ratio of
4664 secondary keys to primary keys is low (or even
1).
</para>
4666 <para>In order to reduce the per-container memory overhead as much
4667 as possible, they are implemented as closely as possible to
4668 singly-linked lists.
</para>
4673 List-based containers do not store internally the number
4674 of values that they hold. This means that their
<function>size
</function>
4675 method has linear complexity (just like
<classname>std::list
</classname>).
4676 Note that finding the number of equivalent-key values in a
4677 standard multimap also has linear complexity (because it must be
4678 done, via
<function>std::distance
</function> of the
4679 multimap's
<function>equal_range
</function> method), but usually with
4686 Most associative-container objects each hold a policy
4687 object (a hash-based container object holds a
4688 hash functor). List-based containers, conversely, only have
4689 class-wide policy objects.
4697 </section> <!-- details -->
4699 </section> <!-- list -->
4702 <!-- priority_queue -->
4703 <section xml:
id=
"pbds.design.container.priority_queue">
4704 <info><title>Priority Queue
</title></info>
4706 <section xml:
id=
"container.priority_queue.interface">
4707 <info><title>Interface
</title></info>
4709 <para>The priority queue container has the following
4713 template
<typename Value_Type,
4714 typename Cmp_Fn = std::less
<Value_Type
>,
4715 typename Tag = pairing_heap_tag,
4716 typename Allocator = std::allocator
<char
> >
4717 class priority_queue;
4720 <para>The parameters have the following meaning:
</para>
4723 <listitem><para><classname>Value_Type
</classname> is the value type.
</para></listitem>
4725 <listitem><para><classname>Cmp_Fn
</classname> is a value comparison functor
</para></listitem>
4727 <listitem><para><classname>Tag
</classname> specifies which underlying data structure
4728 to use.
</para></listitem>
4730 <listitem><para><classname>Allocator
</classname> is an allocator
4731 type.
</para></listitem>
4734 <para>The
<classname>Tag
</classname> parameter specifies which underlying
4735 data structure to use. Instantiating it by
<classname>pairing_heap_tag
</classname>,
<classname>binary_heap_tag
</classname>,
4736 <classname>binomial_heap_tag
</classname>,
4737 <classname>rc_binomial_heap_tag
</classname>,
4738 or
<classname>thin_heap_tag
</classname>,
4739 specifies, respectively,
4740 an underlying pairing heap (
<xref linkend=
"biblio.fredman86pairing"/>),
4741 binary heap (
<xref linkend=
"biblio.clrs2001"/>),
4742 binomial heap (
<xref linkend=
"biblio.clrs2001"/>),
4743 a binomial heap with a redundant binary counter (
<xref linkend=
"biblio.maverik_lowerbounds"/>),
4744 or a thin heap (
<xref linkend=
"biblio.kt99fat_heaps"/>).
4748 As mentioned in the tutorial,
4749 <classname>__gnu_pbds::priority_queue
</classname> shares most of the
4750 same interface with
<classname>std::priority_queue
</classname>.
4751 E.g. if
<varname>q
</varname> is a priority queue of type
4752 <classname>Q
</classname>, then
<function>q.top()
</function> will
4753 return the
"largest" value in the container (according to
4755 Q::cmp_fn
</classname>).
<classname>__gnu_pbds::priority_queue
</classname>
4756 has a larger (and very slightly different) interface than
4757 <classname>std::priority_queue
</classname>, however, since typically
4758 <classname>push
</classname> and
<classname>pop
</classname> are deemed
4759 insufficient for manipulating priority-queues.
</para>
4761 <para>Different settings require different priority-queue
4762 implementations which are described in later; see traits
4763 discusses ways to differentiate between the different traits of
4764 different implementations.
</para>
4769 <section xml:
id=
"container.priority_queue.details">
4770 <info><title>Details
</title></info>
4772 <section xml:
id=
"container.priority_queue.details.iterators">
4773 <info><title>Iterators
</title></info>
4775 <para>There are many different underlying-data structures for
4776 implementing priority queues. Unfortunately, most such
4777 structures are oriented towards making
<function>push
</function> and
4778 <function>top
</function> efficient, and consequently don't allow efficient
4779 access of other elements: for instance, they cannot support an efficient
4780 <function>find
</function> method. In the use case where it
4781 is important to both access and
"do something with" an
4782 arbitrary value, one would be out of luck. For example, many graph algorithms require
4783 modifying a value (typically increasing it in the sense of the
4784 priority queue's comparison functor).
</para>
4786 <para>In order to access and manipulate an arbitrary value in a
4787 priority queue, one needs to reference the internals of the
4788 priority queue from some form of an associative container -
4789 this is unavoidable. Of course, in order to maintain the
4790 encapsulation of the priority queue, this needs to be done in a
4791 way that minimizes exposure to implementation internals.
</para>
4793 <para>In this library the priority queue's
<function>insert
</function>
4794 method returns an iterator, which if valid can be used for subsequent
<function>modify
</function> and
4795 <function>erase
</function> operations. This both preserves the priority
4796 queue's encapsulation, and allows accessing arbitrary values (since the
4797 returned iterators from the
<function>push
</function> operation can be
4798 stored in some form of associative container).
</para>
4800 <para>Priority queues' iterators present a problem regarding their
4801 invalidation guarantees. One assumes that calling
4802 <function>operator++
</function> on an iterator will associate it
4803 with the
"next" value. Priority-queues are
4804 self-organizing: each operation changes what the
"next" value
4805 means. Consequently, it does not make sense that
<function>push
</function>
4806 will return an iterator that can be incremented - this can have
4807 no possible use. Also, as in the case of hash-based containers,
4808 it is awkward to define if a subsequent
<function>push
</function> operation
4809 invalidates a prior returned iterator: it invalidates it in the
4810 sense that its
"next" value is not related to what it
4811 previously considered to be its
"next" value. However, it might not
4812 invalidate it, in the sense that it can be
4813 de-referenced and used for
<function>modify
</function> and
<function>erase
</function>
4816 <para>Similarly to the case of the other unordered associative
4817 containers, this library uses a distinction between
4818 point-type and range type iterators. A priority queue's
<classname>iterator
</classname> can always be
4819 converted to a
<classname>point_iterator
</classname>, and a
4820 <classname>const_iterator
</classname> can always be converted to a
4821 <classname>point_const_iterator
</classname>.
</para>
4823 <para>The following snippet demonstrates manipulating an arbitrary
4826 // A priority queue of integers.
4827 priority_queue
<int
> p;
4829 // Insert some values into the priority queue.
4830 priority_queue
<int
>::point_iterator it = p.push(
0);
4835 // Now modify a value.
4838 assert(p.top() ==
3);
4842 <para>It should be noted that an alternative design could embed an
4843 associative container in a priority queue. Could, but most
4844 probably should not. To begin with, it should be noted that one
4845 could always encapsulate a priority queue and an associative
4846 container mapping values to priority queue iterators with no
4847 performance loss. One cannot, however,
"un-encapsulate" a priority
4848 queue embedding an associative container, which might lead to
4849 performance loss. Assume, that one needs to associate each value
4850 with some data unrelated to priority queues. Then using
4851 this library's design, one could use an
4852 associative container mapping each value to a pair consisting of
4853 this data and a priority queue's iterator. Using the embedded
4854 method would need to use two associative containers. Similar
4855 problems might arise in cases where a value can reside
4856 simultaneously in many priority queues.
</para>
4861 <section xml:
id=
"container.priority_queue.details.d">
4862 <info><title>Underlying Data Structure
</title></info>
4864 <para>There are three main implementations of priority queues: the
4865 first employs a binary heap, typically one which uses a
4866 sequence; the second uses a tree (or forest of trees), which is
4867 typically less structured than an associative container's tree;
4868 the third simply uses an associative container. These are
4869 shown in the graphic below, in labels A1 and A2, label B, and label C.
</para>
4872 <title>Underlying Priority-Queue Data-Structures.
</title>
4875 <imagedata align=
"center" format=
"PNG" scale=
"100"
4876 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_priority_queue_different_underlying_dss.png"/>
4879 <phrase>Underlying Priority-Queue Data-Structures.
</phrase>
4884 <para>Roughly speaking, any value that is both pushed and popped
4885 from a priority queue must incur a logarithmic expense (in the
4886 amortized sense). Any priority queue implementation that would
4887 avoid this, would violate known bounds on comparison-based
4888 sorting (see
<xref linkend=
"biblio.clrs2001"/> and
<xref linkend=
"biblio.brodal96priority"/>).
4891 <para>Most implementations do
4892 not differ in the asymptotic amortized complexity of
4893 <function>push
</function> and
<function>pop
</function> operations, but they differ in
4894 the constants involved, in the complexity of other operations
4895 (e.g.,
<function>modify
</function>), and in the worst-case
4896 complexity of single operations. In general, the more
4897 "structured" an implementation (i.e., the more internal
4898 invariants it possesses) - the higher its amortized complexity
4899 of
<function>push
</function> and
<function>pop
</function> operations.
</para>
4901 <para>This library implements different algorithms using a
4902 single class:
<classname>priority_queue
</classname>.
4903 Instantiating the
<classname>Tag
</classname> template parameter,
"selects"
4904 the implementation:
</para>
4908 Instantiating
<classname>Tag = binary_heap_tag
</classname> creates
4909 a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally
4910 selected by priority_queue
4911 if
<classname>Value_Type
</classname> is instantiated by a primitive type
4912 (e.g., an
<type>int
</type>); the latter is
4913 internally selected for all other types (e.g.,
4914 <classname>std::string
</classname>). This implementations is relatively
4915 unstructured, and so has good
<classname>push
</classname> and
<classname>pop
</classname>
4916 performance; it is the
"best-in-kind" for primitive
4917 types, e.g.,
<type>int
</type>s. Conversely, it has
4918 high worst-case performance, and can support only linear-time
4919 <function>modify
</function> and
<function>erase
</function> operations.
</para></listitem>
4921 <listitem><para>Instantiating
<classname>Tag =
4922 pairing_heap_tag
</classname> creates a pairing heap of the form
4923 in represented by label B in the graphic above. This
4924 implementations too is relatively unstructured, and so has good
4925 <function>push
</function> and
<function>pop
</function>
4926 performance; it is the
"best-in-kind" for non-primitive types,
4927 e.g.,
<classname>std:string
</classname>s. It also has very good
4928 worst-case
<function>push
</function> and
4929 <function>join
</function> performance (O(
1)), but has high
4930 worst-case
<function>pop
</function>
4931 complexity.
</para></listitem>
4933 <listitem><para>Instantiating
<classname>Tag =
4934 binomial_heap_tag
</classname> creates a binomial heap of the
4935 form repsented by label B in the graphic above. This
4936 implementations is more structured than a pairing heap, and so
4937 has worse
<function>push
</function> and
<function>pop
</function>
4938 performance. Conversely, it has sub-linear worst-case bounds for
4939 <function>pop
</function>, e.g., and so it might be preferred in
4940 cases where responsiveness is important.
</para></listitem>
4942 <listitem><para>Instantiating
<classname>Tag =
4943 rc_binomial_heap_tag
</classname> creates a binomial heap of the
4944 form represented in label B above, accompanied by a redundant
4945 counter which governs the trees. This implementations is
4946 therefore more structured than a binomial heap, and so has worse
4947 <function>push
</function> and
<function>pop
</function>
4948 performance. Conversely, it guarantees O(
1)
4949 <function>push
</function> complexity, and so it might be
4950 preferred in cases where the responsiveness of a binomial heap
4951 is insufficient.
</para></listitem>
4953 <listitem><para>Instantiating
<classname>Tag =
4954 thin_heap_tag
</classname> creates a thin heap of the form
4955 represented by the label B in the graphic above. This
4956 implementations too is more structured than a pairing heap, and
4957 so has worse
<function>push
</function> and
4958 <function>pop
</function> performance. Conversely, it has better
4959 worst-case and identical amortized complexities than a Fibonacci
4960 heap, and so might be more appropriate for some graph
4961 algorithms.
</para></listitem>
4964 <para>Of course, one can use any order-preserving associative
4965 container as a priority queue, as in the graphic above label C, possibly by creating an adapter class
4966 over the associative container (much as
4967 <classname>std::priority_queue
</classname> can adapt
<classname>std::vector
</classname>).
4968 This has the advantage that no cross-referencing is necessary
4969 at all; the priority queue itself is an associative container.
4970 Most associative containers are too structured to compete with
4971 priority queues in terms of
<function>push
</function> and
<function>pop
</function>
4978 <section xml:
id=
"container.priority_queue.details.traits">
4979 <info><title>Traits
</title></info>
4981 <para>It would be nice if all priority queues could
4982 share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining
4983 two binary heaps might throw an exception (not corrupt
4984 any of the heaps on which it operates), but joining two pairing
4985 heaps is exception free.
</para>
4987 <para>Tags and traits are very useful for manipulating generic
4988 types.
<classname>__gnu_pbds::priority_queue
</classname>
4989 publicly defines
<classname>container_category
</classname> as one of the tags. Given any
4990 container
<classname>Cntnr
</classname>, the tag of the underlying
4991 data structure can be found via
<classname>typename
4992 Cntnr::container_category
</classname>; this is one of the possible tags shown in the graphic below.
4996 <title>Priority-Queue Data-Structure Tags.
</title>
4999 <imagedata align=
"center" format=
"PNG" scale=
"100"
5000 fileref=
"/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_priority_queue_tag_hierarchy.png"/>
5003 <phrase>Priority-Queue Data-Structure Tags.
</phrase>
5009 <para>Additionally, a traits mechanism can be used to query a
5010 container type for its attributes. Given any container
5011 <classname>Cntnr
</classname>, then
<programlisting>__gnu_pbds::container_traits
<Cntnr
></programlisting>
5012 is a traits class identifying the properties of the
5015 <para>To find if a container might throw if two of its objects are
5018 container_traits
<Cntnr
>::split_join_can_throw
5023 Different priority-queue implementations have different invalidation guarantees. This is
5024 especially important, since there is no way to access an arbitrary
5025 value of priority queues except for iterators. Similarly to
5026 associative containers, one can use
5028 container_traits
<Cntnr
>::invalidation_guarantee
5030 to get the invalidation guarantee type of a priority queue.
</para>
5032 <para>It is easy to understand from the graphic above, what
<classname>container_traits
<Cntnr
>::invalidation_guarantee
</classname>
5033 will be for different implementations. All implementations of
5034 type represented by label B have
<classname>point_invalidation_guarantee
</classname>:
5035 the container can freely internally reorganize the nodes -
5036 range-type iterators are invalidated, but point-type iterators
5037 are always valid. Implementations of type represented by labels A1 and A2 have
<classname>basic_invalidation_guarantee
</classname>:
5038 the container can freely internally reallocate the array - both
5039 point-type and range-type iterators might be invalidated.
</para>
5042 This has major implications, and constitutes a good reason to avoid
5043 using binary heaps. A binary heap can perform
<function>modify
</function>
5044 or
<function>erase
</function> efficiently given a valid point-type
5045 iterator. However, in order to supply it with a valid point-type
5046 iterator, one needs to iterate (linearly) over all
5047 values, then supply the relevant iterator (recall that a
5048 range-type iterator can always be converted to a point-type
5049 iterator). This means that if the number of
<function>modify
</function> or
5050 <function>erase
</function> operations is non-negligible (say
5051 super-logarithmic in the total sequence of operations) - binary
5052 heaps will perform badly.
5057 </section> <!-- details -->
5059 </section> <!-- priority_queue -->
5063 </section> <!-- container -->
5065 </section> <!-- design -->
5070 <xi:include xmlns:
xi=
"http://www.w3.org/2001/XInclude" parse=
"xml"
5071 href=
"test_policy_data_structures.xml">
5074 <!-- S05: Reference/Acknowledgments -->
5075 <section xml:
id=
"pbds.ack">
5076 <info><title>Acknowledgments
</title></info>
5077 <?dbhtml filename=
"policy_data_structures_biblio.html"?>
5080 Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research
5081 Laboratories), and Benjamin Kosnik (Red Hat).
5085 This library was partially written at
5086 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink" xlink:
href=
"http://www.haifa.il.ibm.com/">IBM's Haifa Research Labs
</link>.
5087 It is based heavily on policy-based design and uses many useful
5088 techniques from Modern C++ Design: Generic Programming and Design
5089 Patterns Applied by Andrei Alexandrescu.
5093 Two ideas are borrowed from the SGI-STL implementation:
5099 The prime-based resize policies use a list of primes taken from
5100 the SGI-STL implementation.
5106 The red-black trees contain both a root node and a header node
5107 (containing metadata), connected in a way that forward and
5108 reverse iteration can be performed efficiently.
5114 Some test utilities borrow ideas from
5115 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink" xlink:
href=
"http://www.boost.org/doc/libs/release/libs/timer/index.html">boost::timer
</link>.
5119 We would like to thank Scott Meyers for useful comments (without
5120 attributing to him any flaws in the design or implementation of the
5123 <para>We would like to thank Matt Austern for the suggestion to
5124 include tries.
</para>
5127 <!-- S06: Biblio -->
5128 <bibliography xml:
id=
"pbds.biblio">
5134 <?dbhtml filename=
"policy_data_structures_biblio.html"?>
5137 <biblioentry xml:
id=
"biblio.abrahams97exception">
5139 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
5140 xlink:
href=
"http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1997/N1075.pdf">
5141 STL Exception Handling Contract
5166 <biblioentry xml:
id=
"biblio.alexandrescu01modern">
5168 Modern C++ Design: Generic Programming and Design Patterns Applied
5187 Addison-Wesley Publishing Company
5194 <biblioentry xml:
id=
"biblio.andrew04mtf">
5196 MTF, Bit, and COMB: A Guide to Deterministic and Randomized
5197 Algorithms for the List Update Problem
5226 <biblioentry xml:
id=
"biblio.austern00noset">
5228 Why You Shouldn't Use set - and What You Should Use Instead
5253 <biblioentry xml:
id=
"biblio.austern01htprop">
5255 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
5256 xlink:
href=
"http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2001/n1326.html">
5257 A Proposal to Add Hashtables to the Standard Library
5283 <biblioentry xml:
id=
"biblio.austern98segmentedit">
5285 Segmented iterators and hierarchical algorithms
5310 <biblioentry xml:
id=
"biblio.dawestimer">
5312 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
5313 xlink:
href=
"www.boost.org/doc/libs/release/libs/timer/">
5337 <biblioentry xml:
id=
"biblio.clearypool">
5339 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
5340 xlink:
href=
"www.boost.org/doc/libs/release/libs/pool/">
5365 <biblioentry xml:
id=
"biblio.maddocktraits">
5367 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
5368 xlink:
href=
"www.boost.org/doc/libs/release/libs/type_traits/">
5369 Boost Type Traits Library
5402 <biblioentry xml:
id=
"biblio.brodal96priority">
5404 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
5405 xlink:
href=
"http://portal.acm.org/citation.cfm?id=313883">
5406 Worst-case efficient priority queues
5424 <biblioentry xml:
id=
"biblio.bulkamayheweff">
5426 Efficient C++ Programming Techniques
5457 Addison-Wesley Publishing Company
5463 <biblioentry xml:
id=
"biblio.clrs2001">
5465 Introduction to Algorithms,
2nd edition
5523 <biblioentry xml:
id=
"biblio.dubhashi98neg">
5525 Balls and bins: A study in negative dependence
5555 Random Structures and Algorithms
13
5562 <biblioentry xml:
id=
"biblio.fagin79extendible">
5564 Extendible hashing - a fast access method for dynamic files
5615 ACM Trans. Database Syst.
4
5623 <biblioentry xml:
id=
"biblio.filliatre2000ptset">
5625 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
5626 xlink:
href=
"http://cristal.inria.fr/~frisch/icfp06_contest/advtr/applyOmatic/ptset.ml">
5627 Ptset: Sets of integers implemented as Patricia trees
5650 <biblioentry xml:
id=
"biblio.fredman86pairing">
5652 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
5653 xlink:
href=
"http://www.cs.cmu.edu/~sleator/papers/pairing-heaps.pdf">
5654 The pairing heap: a new form of self-adjusting heap
5706 <biblioentry xml:
id=
"biblio.gof">
5708 Design Patterns - Elements of Reusable Object-Oriented Software
5757 Addison-Wesley Publishing Company
5764 <biblioentry xml:
id=
"biblio.garg86order">
5766 Order-preserving key transformations
5796 Trans. Database Syst.
11
5802 <biblioentry xml:
id=
"biblio.hyslop02making">
5804 Making a real hash of things
5841 <biblioentry xml:
id=
"biblio.jossutis01stl">
5843 The C++ Standard Library - A Tutorial and Reference
5861 Addison-Wesley Publishing Company
5867 <biblioentry xml:
id=
"biblio.kt99fat_heaps">
5869 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
5870 xlink:
href=
"http://www.cs.princeton.edu/research/techreps/TR-597-99">
5871 New Heap Data Structures
5904 <biblioentry xml:
id=
"biblio.kleft00sets">
5906 Are Set Iterators Mutable or Immutable?
5943 <biblioentry xml:
id=
"biblio.knuth98sorting">
5945 The Art of Computer Programming - Sorting and Searching
5964 Addison-Wesley Publishing Company
5970 <biblioentry xml:
id=
"biblio.liskov98data">
5972 Data abstraction and hierarchy
5997 <biblioentry xml:
id=
"biblio.litwin80lh">
5999 Linear hashing: A new tool for file and table addressing
6018 Proceedings of International Conference on Very Large Data Bases
6024 <biblioentry xml:
id=
"biblio.maverik_lowerbounds">
6026 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
6027 xlink:
href=
"http://magic.aladdin.cs.cmu.edu/2005/08/01/deamortization-part-2-binomial-heaps">
6028 Deamortization - Part
2: Binomial Heaps
6048 <biblioentry xml:
id=
"biblio.meyers96more">
6050 More Effective C++:
35 New Ways to Improve Your Programs and Designs
6069 Addison-Wesley Publishing Company
6075 <biblioentry xml:
id=
"biblio.meyers00nonmember">
6077 How Non-Member Functions Improve Encapsulation
6102 <biblioentry xml:
id=
"biblio.meyers01stl">
6104 Effective STL:
50 Specific Ways to Improve Your Use of the Standard Template Library
6123 Addison-Wesley Publishing Company
6129 <biblioentry xml:
id=
"biblio.meyers02both">
6131 Class Template, Member Template - or Both?
6156 <biblioentry xml:
id=
"biblio.motwani95random">
6158 Randomized Algorithms
6187 Cambridge University Press
6194 <biblioentry xml:
id=
"biblio.mscom">
6196 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
6197 xlink:
href=
"http://www.microsoft.com/com">
6198 COM: Component Model Object Technologies
6209 <biblioentry xml:
id=
"biblio.musser95rationale">
6211 Rationale for Adding Hash Tables to the C++ Standard Template Library
6231 <biblioentry xml:
id=
"biblio.musser96stltutorial">
6233 STL Tutorial and Reference Guide
6263 Addison-Wesley Publishing Company
6271 <biblioentry xml:
id=
"biblio.nelson96stlpq">
6273 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
6274 xlink:
href=
"http://www.dogma.net/markn/articles/pq_stl/priority.htm">Priority Queues and the STL
6301 <biblioentry xml:
id=
"biblio.okasaki98mereable">
6303 Fast mergeable integer maps
6338 <biblioentry xml:
id=
"biblio.sgi_stl">
6340 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
6341 xlink:
href=
"http://www.sgi.com/tech/stl">
6342 Standard Template Library Programmer's Guide
6364 <biblioentry xml:
id=
"biblio.select_man">
6366 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
6367 xlink:
href=
"http://www.scit.wlv.ac.uk/cgi-bin/mansec?3C+select">
6375 <biblioentry xml:
id=
"biblio.sleator84amortized">
6377 Amortized Efficiency of List Update Problems
6408 ACM Symposium on Theory of Computing
6414 <biblioentry xml:
id=
"biblio.sleator85self">
6416 Self-Adjusting Binary Search Trees
6448 ACM Symposium on Theory of Computing
6454 <biblioentry xml:
id=
"biblio.stepanov94standard">
6456 The Standard Template Library
6486 <biblioentry xml:
id=
"biblio.stroustrup97cpp">
6488 The C++ Programming Langugage
6507 Addison-Wesley Publishing Company
6513 <biblioentry xml:
id=
"biblio.vandevoorde2002cpptemplates">
6515 C++ Templates: The Complete Guide
6545 Addison-Wesley Publishing Company
6552 <biblioentry xml:
id=
"biblio.wickland96thirty">
6554 <link xmlns:
xlink=
"http://www.w3.org/1999/xlink"
6555 xlink:
href=
"http://myweb.wvnet.edu/~gsa00121/books/amongdead30.zip">
6556 Thirty Years Among the Dead
6576 National Psychological Institute