]> git.ipfire.org Git - thirdparty/gcc.git/blob - libstdc++-v3/doc/xml/manual/policy_data_structures.xml
d0bf146fe8aba909c79c7833b9d3f5407b9d3516
[thirdparty/gcc.git] / libstdc++-v3 / doc / xml / manual / policy_data_structures.xml
1 <chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
2 xml:id="manual.ext.containers.pbds" xreflabel="pbds">
3 <info>
4 <title>Policy-Based Data Structures</title>
5 <keywordset>
6 <keyword>
7 ISO C++
8 </keyword>
9 <keyword>
10 policy
11 </keyword>
12 <keyword>
13 container
14 </keyword>
15 <keyword>
16 data
17 </keyword>
18 <keyword>
19 structure
20 </keyword>
21 <keyword>
22 associated
23 </keyword>
24 <keyword>
25 tree
26 </keyword>
27 <keyword>
28 trie
29 </keyword>
30 <keyword>
31 hash
32 </keyword>
33 <keyword>
34 metaprogramming
35 </keyword>
36 </keywordset>
37 </info>
38 <?dbhtml filename="policy_data_structures.html"?>
39
40 <!-- 2006-04-01 Ami Tavory -->
41 <!-- 2011-05-25 Benjamin Kosnik -->
42
43 <!-- S01: intro -->
44 <section xml:id="pbds.intro">
45 <info><title>Intro</title></info>
46
47 <para>
48 This is a library of policy-based elementary data structures:
49 associative containers and priority queues. It is designed for
50 high-performance, flexibility, semantic safety, and conformance to
51 the corresponding containers in <literal>std</literal> and
52 <literal>std::tr1</literal> (except for some points where it differs
53 by design).
54 </para>
55 <para>
56 </para>
57
58 <section xml:id="pbds.intro.issues">
59 <info><title>Performance Issues</title></info>
60 <para>
61 </para>
62
63 <para>
64 An attempt is made to categorize the wide variety of possible
65 container designs in terms of performance-impacting factors. These
66 performance factors are translated into design policies and
67 incorporated into container design.
68 </para>
69
70 <para>
71 There is tension between unravelling factors into a coherent set of
72 policies. Every attempt is made to make a minimal set of
73 factors. However, in many cases multiple factors make for long
74 template names. Every attempt is made to alias and use typedefs in
75 the source files, but the generated names for external symbols can
76 be large for binary files or debuggers.
77 </para>
78
79 <para>
80 In many cases, the longer names allow capabilities and behaviours
81 controlled by macros to also be unamibiguously emitted as distinct
82 generated names.
83 </para>
84
85 <para>
86 Specific issues found while unraveling performance factors in the
87 design of associative containers and priority queues follow.
88 </para>
89
90 <section xml:id="pbds.intro.issues.associative">
91 <info><title>Associative</title></info>
92
93 <para>
94 Associative containers depend on their composite policies to a very
95 large extent. Implicitly hard-wiring policies can hamper their
96 performance and limit their functionality. An efficient hash-based
97 container, for example, requires policies for testing key
98 equivalence, hashing keys, translating hash values into positions
99 within the hash table, and determining when and how to resize the
100 table internally. A tree-based container can efficiently support
101 order statistics, i.e. the ability to query what is the order of
102 each key within the sequence of keys in the container, but only if
103 the container is supplied with a policy to internally update
104 meta-data. There are many other such examples.
105 </para>
106
107 <para>
108 Ideally, all associative containers would share the same
109 interface. Unfortunately, underlying data structures and mapping
110 semantics differentiate between different containers. For example,
111 suppose one writes a generic function manipulating an associative
112 container.
113 </para>
114
115 <programlisting>
116 template&lt;typename Cntnr&gt;
117 void
118 some_op_sequence(Cntnr&amp; r_cnt)
119 {
120 ...
121 }
122 </programlisting>
123
124 <para>
125 Given this, then what can one assume about the instantiating
126 container? The answer varies according to its underlying data
127 structure. If the underlying data structure of
128 <literal>Cntnr</literal> is based on a tree or trie, then the order
129 of elements is well defined; otherwise, it is not, in general. If
130 the underlying data structure of <literal>Cntnr</literal> is based
131 on a collision-chaining hash table, then modifying
132 r_<literal>Cntnr</literal> will not invalidate its iterators' order;
133 if the underlying data structure is a probing hash table, then this
134 is not the case. If the underlying data structure is based on a tree
135 or trie, then a reference to the container can efficiently be split;
136 otherwise, it cannot, in general. If the underlying data structure
137 is a red-black tree, then splitting a reference to the container is
138 exception-free; if it is an ordered-vector tree, exceptions can be
139 thrown.
140 </para>
141
142 </section>
143
144 <section xml:id="pbds.intro.issues.priority_queue">
145 <info><title>Priority Que</title></info>
146
147 <para>
148 Priority queues are useful when one needs to efficiently access a
149 minimum (or maximum) value as the set of values changes.
150 </para>
151
152 <para>
153 Most useful data structures for priority queues have a relatively
154 simple structure, as they are geared toward relatively simple
155 requirements. Unfortunately, these structures do not support access
156 to an arbitrary value, which turns out to be necessary in many
157 algorithms. Say, decreasing an arbitrary value in a graph
158 algorithm. Therefore, some extra mechanism is necessary and must be
159 invented for accessing arbitrary values. There are at least two
160 alternatives: embedding an associative container in a priority
161 queue, or allowing cross-referencing through iterators. The first
162 solution adds significant overhead; the second solution requires a
163 precise definition of iterator invalidation. Which is the next
164 point...
165 </para>
166
167 <para>
168 Priority queues, like hash-based containers, store values in an
169 order that is meaningless and undefined externally. For example, a
170 <code>push</code> operation can internally reorganize the
171 values. Because of this characteristic, describing a priority
172 queues' iterator is difficult: on one hand, the values to which
173 iterators point can remain valid, but on the other, the logical
174 order of iterators can change unpredictably.
175 </para>
176
177 <para>
178 Roughly speaking, any element that is both inserted to a priority
179 queue (e.g. through <code>push</code>) and removed
180 from it (e.g., through <code>pop</code>), incurs a
181 logarithmic overhead (in the amortized sense). Different underlying
182 data structures place the actual cost differently: some are
183 optimized for amortized complexity, whereas others guarantee that
184 specific operations only have a constant cost. One underlying data
185 structure might be chosen if modifying a value is frequent
186 (Dijkstra's shortest-path algorithm), whereas a different one might
187 be chosen otherwise. Unfortunately, an array-based binary heap - an
188 underlying data structure that optimizes (in the amortized sense)
189 <code>push</code> and <code>pop</code> operations, differs from the
190 others in terms of its invalidation guarantees. Other design
191 decisions also impact the cost and placement of the overhead, at the
192 expense of more difference in the the kinds of operations that the
193 underlying data structure can support. These differences pose a
194 challenge when creating a uniform interface for priority queues.
195 </para>
196 </section>
197 </section>
198
199 <section xml:id="pbds.intro.motivation">
200 <info><title>Goals</title></info>
201
202 <para>
203 Many fine associative-container libraries were already written,
204 most notably, the C++ standard's associative containers. Why
205 then write another library? This section shows some possible
206 advantages of this library, when considering the challenges in
207 the introduction. Many of these points stem from the fact that
208 the ISO C++ process introduced associative-containers in a
209 two-step process (first standardizing tree-based containers,
210 only then adding hash-based containers, which are fundamentally
211 different), did not standardize priority queues as containers,
212 and (in our opinion) overloads the iterator concept.
213 </para>
214
215 <section xml:id="pbds.intro.motivation.associative">
216 <info><title>Associative</title></info>
217 <para>
218 </para>
219
220 <section xml:id="motivation.associative.policy">
221 <info><title>Policy Choices</title></info>
222 <para>
223 Associative containers require a relatively large number of
224 policies to function efficiently in various settings. In some
225 cases this is needed for making their common operations more
226 efficient, and in other cases this allows them to support a
227 larger set of operations
228 </para>
229
230 <orderedlist>
231 <listitem>
232 <para>
233 Hash-based containers, for example, support look-up and
234 insertion methods (<function>find</function> and
235 <function>insert</function>). In order to locate elements
236 quickly, they are supplied a hash functor, which instruct
237 how to transform a key object into some size type; a hash
238 functor might transform <constant>"hello"</constant>
239 into <constant>1123002298</constant>. A hash table, though,
240 requires transforming each key object into some size-type
241 type in some specific domain; a hash table with a 128-long
242 table might transform <constant>"hello"</constant> into
243 position <constant>63</constant>. The policy by which the
244 hash value is transformed into a position within the table
245 can dramatically affect performance. Hash-based containers
246 also do not resize naturally (as opposed to tree-based
247 containers, for example). The appropriate resize policy is
248 unfortunately intertwined with the policy that transforms
249 hash value into a position within the table.
250 </para>
251 </listitem>
252
253 <listitem>
254 <para>
255 Tree-based containers, for example, also support look-up and
256 insertion methods, and are primarily useful when maintaining
257 order between elements is important. In some cases, though,
258 one can utilize their balancing algorithms for completely
259 different purposes.
260 </para>
261
262 <para>
263 Figure A shows a tree whose each node contains two entries:
264 a floating-point key, and some size-type
265 <emphasis>metadata</emphasis> (in bold beneath it) that is
266 the number of nodes in the sub-tree. (The root has key 0.99,
267 and has 5 nodes (including itself) in its sub-tree.) A
268 container based on this data structure can obviously answer
269 efficiently whether 0.3 is in the container object, but it
270 can also answer what is the order of 0.3 among all those in
271 the container object: see <xref linkend="biblio.clrs2001"/>.
272
273 </para>
274
275 <para>
276 As another example, Figure B shows a tree whose each node
277 contains two entries: a half-open geometric line interval,
278 and a number <emphasis>metadata</emphasis> (in bold beneath
279 it) that is the largest endpoint of all intervals in its
280 sub-tree. (The root describes the interval <constant>[20,
281 36)</constant>, and the largest endpoint in its sub-tree is
282 99.) A container based on this data structure can obviously
283 answer efficiently whether <constant>[3, 41)</constant> is
284 in the container object, but it can also answer efficiently
285 whether the container object has intervals that intersect
286 <constant>[3, 41)</constant>. These types of queries are
287 very useful in geometric algorithms and lease-management
288 algorithms.
289 </para>
290
291 <para>
292 It is important to note, however, that as the trees are
293 modified, their internal structure changes. To maintain
294 these invariants, one must supply some policy that is aware
295 of these changes. Without this, it would be better to use a
296 linked list (in itself very efficient for these purposes).
297 </para>
298
299 </listitem>
300 </orderedlist>
301
302 <figure>
303 <title>Node Invariants</title>
304 <mediaobject>
305 <imageobject>
306 <imagedata align="center" format="PNG" scale="100"
307 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_node_invariants.png"/>
308 </imageobject>
309 <textobject>
310 <phrase>Node Invariants</phrase>
311 </textobject>
312 </mediaobject>
313 </figure>
314
315 </section>
316
317 <section xml:id="motivation.associative.underlying">
318 <info><title>Underlying Data Structures</title></info>
319 <para>
320 The standard C++ library contains associative containers based on
321 red-black trees and collision-chaining hash tables. These are
322 very useful, but they are not ideal for all types of
323 settings.
324 </para>
325
326 <para>
327 The figure below shows the different underlying data structures
328 currently supported in this library.
329 </para>
330
331 <figure>
332 <title>Underlying Associative Data Structures</title>
333 <mediaobject>
334 <imageobject>
335 <imagedata align="center" format="PNG" scale="100"
336 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_different_underlying_dss_1.png"/>
337 </imageobject>
338 <textobject>
339 <phrase>Underlying Associative Data Structures</phrase>
340 </textobject>
341 </mediaobject>
342 </figure>
343
344 <para>
345 A shows a collision-chaining hash-table, B shows a probing
346 hash-table, C shows a red-black tree, D shows a splay tree, E shows
347 a tree based on an ordered vector(implicit in the order of the
348 elements), F shows a PATRICIA trie, and G shows a list-based
349 container with update policies.
350 </para>
351
352 <para>
353 Each of these data structures has some performance benefits, in
354 terms of speed, size or both. For now, note that vector-based trees
355 and probing hash tables manipulate memory more efficiently than
356 red-black trees and collision-chaining hash tables, and that
357 list-based associative containers are very useful for constructing
358 "multimaps".
359 </para>
360
361 <para>
362 Now consider a function manipulating a generic associative
363 container,
364 </para>
365 <programlisting>
366 template&lt;class Cntnr&gt;
367 int
368 some_op_sequence(Cntnr &amp;r_cnt)
369 {
370 ...
371 }
372 </programlisting>
373
374 <para>
375 Ideally, the underlying data structure
376 of <classname>Cntnr</classname> would not affect what can be
377 done with <varname>r_cnt</varname>. Unfortunately, this is not
378 the case.
379 </para>
380
381 <para>
382 For example, if <classname>Cntnr</classname>
383 is <classname>std::map</classname>, then the function can
384 use
385 </para>
386 <programlisting>
387 std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)
388 </programlisting>
389 <para>
390 in order to apply <classname>foobar</classname> to all
391 elements between <classname>foo</classname> and
392 <classname>bar</classname>. If
393 <classname>Cntnr</classname> is a hash-based container,
394 then this call's results are undefined.
395 </para>
396
397 <para>
398 Also, if <classname>Cntnr</classname> is tree-based, the type
399 and object of the comparison functor can be
400 accessed. If <classname>Cntnr</classname> is hash based, these
401 queries are nonsensical.
402 </para>
403
404 <para>
405 There are various other differences based on the container's
406 underlying data structure. For one, they can be constructed by,
407 and queried for, different policies. Furthermore:
408 </para>
409
410 <orderedlist>
411 <listitem>
412 <para>
413 Containers based on C, D, E and F store elements in a
414 meaningful order; the others store elements in a meaningless
415 (and probably time-varying) order. By implication, only
416 containers based on C, D, E and F can
417 support <function>erase</function> operations taking an
418 iterator and returning an iterator to the following element
419 without performance loss.
420 </para>
421 </listitem>
422
423 <listitem>
424 <para>
425 Containers based on C, D, E, and F can be split and joined
426 efficiently, while the others cannot. Containers based on C
427 and D, furthermore, can guarantee that this is exception-free;
428 containers based on E cannot guarantee this.
429 </para>
430 </listitem>
431
432 <listitem>
433 <para>
434 Containers based on all but E can guarantee that
435 erasing an element is exception free; containers based on E
436 cannot guarantee this. Containers based on all but B and E
437 can guarantee that modifying an object of their type does
438 not invalidate iterators or references to their elements,
439 while containers based on B and E cannot. Containers based
440 on C, D, and E can furthermore make a stronger guarantee,
441 namely that modifying an object of their type does not
442 affect the order of iterators.
443 </para>
444 </listitem>
445 </orderedlist>
446
447 <para>
448 A unified tag and traits system (as used for the C++ standard
449 library iterators, for example) can ease generic manipulation of
450 associative containers based on different underlying data
451 structures.
452 </para>
453
454 </section>
455
456 <section xml:id="motivation.associative.iterators">
457 <info><title>Iterators</title></info>
458 <para>
459 Iterators are centric to the design of the standard library
460 containers, because of the container/algorithm/iterator
461 decomposition that allows an algorithm to operate on a range
462 through iterators of some sequence. Iterators, then, are useful
463 because they allow going over a
464 specific <emphasis>sequence</emphasis>. The standard library
465 also uses iterators for accessing a
466 specific <emphasis>element</emphasis>: when an associative
467 container returns one through <function>find</function>. The
468 standard library consistently uses the same types of iterators
469 for both purposes: going over a range, and accessing a specific
470 found element. Before the introduction of hash-based containers
471 to the standard library, this made sense (with the exception of
472 priority queues, which are discussed later).
473 </para>
474
475 <para>
476 Using the standard associative containers together with
477 non-order-preserving associative containers (and also because of
478 priority-queues container), there is a possible need for
479 different types of iterators for self-organizing containers:
480 the iterator concept seems overloaded to mean two different
481 things (in some cases). <remark> XXX
482 "ds_gen.html#find_range">Design::Associative
483 Containers::Data-Structure Genericity::Point-Type and Range-Type
484 Methods</remark>.
485 </para>
486
487 <section xml:id="associative.iterators.using">
488 <info>
489 <title>Using Point Iterators for Range Operations</title>
490 </info>
491 <para>
492 Suppose <classname>cntnr</classname> is some associative
493 container, and say <varname>c</varname> is an object of
494 type <classname>cntnr</classname>. Then what will be the outcome
495 of
496 </para>
497
498 <programlisting>
499 std::for_each(c.find(1), c.find(5), foo);
500 </programlisting>
501
502 <para>
503 If <classname>cntnr</classname> is a tree-based container
504 object, then an in-order walk will
505 apply <classname>foo</classname> to the relevant elements,
506 as in the graphic below, label A. If <varname>c</varname> is
507 a hash-based container, then the order of elements between any
508 two elements is undefined (and probably time-varying); there is
509 no guarantee that the elements traversed will coincide with the
510 <emphasis>logical</emphasis> elements between 1 and 5, as in
511 label B.
512 </para>
513
514 <figure>
515 <title>Range Iteration in Different Data Structures</title>
516 <mediaobject>
517 <imageobject>
518 <imagedata align="center" format="PNG" scale="100"
519 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_point_iterators_range_ops_1.png"/>
520 </imageobject>
521 <textobject>
522 <phrase>Node Invariants</phrase>
523 </textobject>
524 </mediaobject>
525 </figure>
526
527 <para>
528 In our opinion, this problem is not caused just because
529 red-black trees are order preserving while
530 collision-chaining hash tables are (generally) not - it
531 is more fundamental. Most of the standard's containers
532 order sequences in a well-defined manner that is
533 determined by their <emphasis>interface</emphasis>:
534 calling <function>insert</function> on a tree-based
535 container modifies its sequence in a predictable way, as
536 does calling <function>push_back</function> on a list or
537 a vector. Conversely, collision-chaining hash tables,
538 probing hash tables, priority queues, and list-based
539 containers (which are very useful for "multimaps") are
540 self-organizing data structures; the effect of each
541 operation modifies their sequences in a manner that is
542 (practically) determined by their
543 <emphasis>implementation</emphasis>.
544 </para>
545
546 <para>
547 Consequently, applying an algorithm to a sequence obtained from most
548 containers may or may not make sense, but applying it to a
549 sub-sequence of a self-organizing container does not.
550 </para>
551 </section>
552
553 <section xml:id="associative.iterators.cost">
554 <info>
555 <title>Cost to Point Iterators to Enable Range Operations</title>
556 </info>
557 <para>
558 Suppose <varname>c</varname> is some collision-chaining
559 hash-based container object, and one calls
560 </para>
561 <programlisting>c.find(3)</programlisting>
562 <para>
563 Then what composes the returned iterator?
564 </para>
565
566 <para>
567 In the graphic below, label A shows the simplest (and
568 most efficient) implementation of a collision-chaining
569 hash table. The little box marked
570 <classname>point_iterator</classname> shows an object
571 that contains a pointer to the element's node. Note that
572 this "iterator" has no way to move to the next element (
573 it cannot support
574 <function>operator++</function>). Conversely, the little
575 box marked <classname>iterator</classname> stores both a
576 pointer to the element, as well as some other
577 information (the bucket number of the element). the
578 second iterator, then, is "heavier" than the first one-
579 it requires more time and space. If we were to use a
580 different container to cross-reference into this
581 hash-table using these iterators - it would take much
582 more space. As noted above, nothing much can be done by
583 incrementing these iterators, so why is this extra
584 information needed?
585 </para>
586
587 <para>
588 Alternatively, one might create a collision-chaining hash-table
589 where the lists might be linked, forming a monolithic total-element
590 list, as in the graphic below, label B. Here the iterators are as
591 light as can be, but the hash-table's operations are more
592 complicated.
593 </para>
594
595 <figure>
596 <title>Point Iteration in Hash Data Structures</title>
597 <mediaobject>
598 <imageobject>
599 <imagedata align="center" format="PNG" scale="100"
600 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_point_iterators_range_ops_2.png"/>
601 </imageobject>
602 <textobject>
603 <phrase>Point Iteration in Hash Data Structures</phrase>
604 </textobject>
605 </mediaobject>
606 </figure>
607
608 <para>
609 It should be noted that containers based on collision-chaining
610 hash-tables are not the only ones with this type of behavior;
611 many other self-organizing data structures display it as well.
612 </para>
613 </section>
614
615 <section xml:id="associative.iterators.invalidation">
616 <info><title>Invalidation Guarantees</title></info>
617 <para>Consider the following snippet:</para>
618 <programlisting>
619 it = c.find(3);
620 c.erase(5);
621 </programlisting>
622
623 <para>
624 Following the call to <classname>erase</classname>, what is the
625 validity of <classname>it</classname>: can it be de-referenced?
626 can it be incremented?
627 </para>
628
629 <para>
630 The answer depends on the underlying data structure of the
631 container. The graphic below shows three cases: A1 and A2 show
632 a red-black tree; B1 and B2 show a probing hash-table; C1 and C2
633 show a collision-chaining hash table.
634 </para>
635
636 <figure>
637 <title>Effect of erase in different underlying data structures</title>
638 <mediaobject>
639 <imageobject>
640 <imagedata align="center" format="PNG" scale="100"
641 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_invalidation_guarantee_erase.png"/>
642 </imageobject>
643 <textobject>
644 <phrase>Effect of erase in different underlying data structures</phrase>
645 </textobject>
646 </mediaobject>
647 </figure>
648
649 <orderedlist>
650 <listitem>
651 <para>
652 Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can
653 be de-referenced and incremented. The sequence of iterators
654 changed, but in a way that is well-defined by the interface.
655 </para>
656 </listitem>
657
658 <listitem>
659 <para>
660 Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
661 not valid at all - it cannot be de-referenced or
662 incremented; the order of iterators changed in a way that is
663 (practically) determined by the implementation and not by
664 the interface.
665 </para>
666 </listitem>
667
668 <listitem>
669 <para>
670 Erasing 5 from C1 yields C2. Here the situation is more
671 complicated. On the one hand, there is no problem in
672 de-referencing <classname>it</classname>. On the other hand,
673 the order of iterators changed in a way that is
674 (practically) determined by the implementation and not by
675 the interface.
676 </para>
677 </listitem>
678 </orderedlist>
679
680 <para>
681 So in the standard library containers, it is not always possible
682 to express whether <varname>it</varname> is valid or not. This
683 is true also for <function>insert</function>. Again, the
684 iterator concept seems overloaded.
685 </para>
686 </section>
687 </section> <!--iterators-->
688
689
690 <section xml:id="motivation.associative.functions">
691 <info><title>Functional</title></info>
692 <para>
693 </para>
694
695 <para>
696 The design of the functional overlay to the underlying data
697 structures differs slightly from some of the conventions used in
698 the C++ standard. A strict public interface of methods that
699 comprise only operations which depend on the class's internal
700 structure; other operations are best designed as external
701 functions. (See <xref linkend="biblio.meyers02both"/>).With this
702 rubric, the standard associative containers lack some useful
703 methods, and provide other methods which would be better
704 removed.
705 </para>
706
707 <section xml:id="motivation.associative.functions.erase">
708 <info><title><function>erase</function></title></info>
709
710 <orderedlist>
711 <listitem>
712 <para>
713 Order-preserving standard associative containers provide the
714 method
715 </para>
716 <programlisting>
717 iterator
718 erase(iterator it)
719 </programlisting>
720
721 <para>
722 which takes an iterator, erases the corresponding
723 element, and returns an iterator to the following
724 element. Also standardd hash-based associative
725 containers provide this method. This seemingly
726 increasesgenericity between associative containers,
727 since it is possible to use
728 </para>
729 <programlisting>
730 typename C::iterator it = c.begin();
731 typename C::iterator e_it = c.end();
732
733 while(it != e_it)
734 it = pred(*it)? c.erase(it) : ++it;
735 </programlisting>
736
737 <para>
738 in order to erase from a container object <varname>
739 c</varname> all element which match a
740 predicate <classname>pred</classname>. However, in a
741 different sense this actually decreases genericity: an
742 integral implication of this method is that tree-based
743 associative containers' memory use is linear in the total
744 number of elements they store, while hash-based
745 containers' memory use is unbounded in the total number of
746 elements they store. Assume a hash-based container is
747 allowed to decrease its size when an element is
748 erased. Then the elements might be rehashed, which means
749 that there is no "next" element - it is simply
750 undefined. Consequently, it is possible to infer from the
751 fact that the standard library's hash-based containers
752 provide this method that they cannot downsize when
753 elements are erased. As a consequence, different code is
754 needed to manipulate different containers, assuming that
755 memory should be conserved. Therefor, this library's
756 non-order preserving associative containers omit this
757 method.
758 </para>
759 </listitem>
760
761 <listitem>
762 <para>
763 All associative containers include a conditional-erase method
764 </para>
765 <programlisting>
766 template&lt;
767 class Pred&gt;
768 size_type
769 erase_if
770 (Pred pred)
771 </programlisting>
772 <para>
773 which erases all elements matching a predicate. This is probably the
774 only way to ensure linear-time multiple-item erase which can
775 actually downsize a container.
776 </para>
777 </listitem>
778
779 <listitem>
780 <para>
781 The standard associative containers provide methods for
782 multiple-item erase of the form
783 </para>
784 <programlisting>
785 size_type
786 erase(It b, It e)
787 </programlisting>
788 <para>
789 erasing a range of elements given by a pair of
790 iterators. For tree-based or trie-based containers, this can
791 implemented more efficiently as a (small) sequence of split
792 and join operations. For other, unordered, containers, this
793 method isn't much better than an external loop. Moreover,
794 if <varname>c</varname> is a hash-based container,
795 then
796 </para>
797 <programlisting>
798 c.erase(c.find(2), c.find(5))
799 </programlisting>
800 <para>
801 is almost certain to do something
802 different than erasing all elements whose keys are between 2
803 and 5, and is likely to produce other undefined behavior.
804 </para>
805 </listitem>
806 </orderedlist>
807 </section> <!-- erase -->
808
809 <section xml:id="motivation.associative.functions.split">
810 <info>
811 <title>
812 <function>split</function> and <function>join</function>
813 </title>
814 </info>
815 <para>
816 It is well-known that tree-based and trie-based container
817 objects can be efficiently split or joined (See
818 <xref linkend="biblio.clrs2001"/>). Externally splitting or
819 joining trees is super-linear, and, furthermore, can throw
820 exceptions. Split and join methods, consequently, seem good
821 choices for tree-based container methods, especially, since as
822 noted just before, they are efficient replacements for erasing
823 sub-sequences.
824 </para>
825
826 </section> <!-- split -->
827
828 <section xml:id="motivation.associative.functions.insert">
829 <info>
830 <title>
831 <function>insert</function>
832 </title>
833 </info>
834 <para>
835 The standard associative containers provide methods of the form
836 </para>
837 <programlisting>
838 template&lt;class It&gt;
839 size_type
840 insert(It b, It e);
841 </programlisting>
842
843 <para>
844 for inserting a range of elements given by a pair of
845 iterators. At best, this can be implemented as an external loop,
846 or, even more efficiently, as a join operation (for the case of
847 tree-based or trie-based containers). Moreover, these methods seem
848 similar to constructors taking a range given by a pair of
849 iterators; the constructors, however, are transactional, whereas
850 the insert methods are not; this is possibly confusing.
851 </para>
852
853 </section> <!-- insert -->
854
855 <section xml:id="motivation.associative.functions.compare">
856 <info>
857 <title>
858 <function>operator==</function> and <function>operator&lt;=</function>
859 </title>
860 </info>
861
862 <para>
863 Associative containers are parametrized by policies allowing to
864 test key equivalence: a hash-based container can do this through
865 its equivalence functor, and a tree-based container can do this
866 through its comparison functor. In addition, some standard
867 associative containers have global function operators, like
868 <function>operator==</function> and <function>operator&lt;=</function>,
869 that allow comparing entire associative containers.
870 </para>
871
872 <para>
873 In our opinion, these functions are better left out. To begin
874 with, they do not significantly improve over an external
875 loop. More importantly, however, they are possibly misleading -
876 <function>operator==</function>, for example, usually checks for
877 equivalence, or interchangeability, but the associative
878 container cannot check for values' equivalence, only keys'
879 equivalence; also, are two containers considered equivalent if
880 they store the same values in different order? this is an
881 arbitrary decision.
882 </para>
883 </section> <!-- compare -->
884
885 </section> <!-- functional -->
886
887 </section> <!--associative-->
888
889 <section xml:id="pbds.intro.motivation.priority_queue">
890 <info><title>Priority Queues</title></info>
891
892 <section xml:id="motivation.priority_queue.policy">
893 <info><title>Policy Choices</title></info>
894
895 <para>
896 Priority queues are containers that allow efficiently inserting
897 values and accessing the maximal value (in the sense of the
898 container's comparison functor). Their interface
899 supports <function>push</function>
900 and <function>pop</function>. The standard
901 container <classname>std::priorityqueue</classname> indeed support
902 these methods, but little else. For algorithmic and
903 software-engineering purposes, other methods are needed:
904 </para>
905
906 <orderedlist>
907 <listitem>
908 <para>
909 Many graph algorithms (see
910 <xref linkend="biblio.clrs2001"/>) require increasing a
911 value in a priority queue (again, in the sense of the
912 container's comparison functor), or joining two
913 priority-queue objects.
914 </para>
915 </listitem>
916
917 <listitem>
918 <para>The return type of <classname>priority_queue</classname>'s
919 <function>push</function> method is a point-type iterator, which can
920 be used for modifying or erasing arbitrary values. For
921 example:</para>
922 <programlisting>
923 priority_queue&lt;int&gt; p;
924 priority_queue&lt;int&gt;::point_iterator it = p.push(3);
925 p.modify(it, 4);
926 </programlisting>
927
928 <para>These types of cross-referencing operations are necessary
929 for making priority queues useful for different applications,
930 especially graph applications.</para>
931
932 </listitem>
933 <listitem>
934 <para>
935 It is sometimes necessary to erase an arbitrary value in a
936 priority queue. For example, consider
937 the <function>select</function> function for monitoring
938 file descriptors:
939 </para>
940
941 <programlisting>
942 int
943 select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds,
944 struct timeval *timeout);
945 </programlisting>
946 <para>
947 then, as the select documentation states:
948 </para>
949 <para>
950 <quote>
951 The nfds argument specifies the range of file
952 descriptors to be tested. The select() function tests file
953 descriptors in the range of 0 to nfds-1.</quote>
954 </para>
955
956 <para>
957 It stands to reason, therefore, that we might wish to
958 maintain a minimal value for <varname>nfds</varname>, and
959 priority queues immediately come to mind. Note, though, that
960 when a socket is closed, the minimal file description might
961 change; in the absence of an efficient means to erase an
962 arbitrary value from a priority queue, we might as well
963 avoid its use altogether.
964 </para>
965
966 <para>
967 The standard containers typically support iterators. It is
968 somewhat unusual
969 for <classname>std::priority_queue</classname> to omit them
970 (See <xref linkend="biblio.meyers01stl"/>). One might
971 ask why do priority queues need to support iterators, since
972 they are self-organizing containers with a different purpose
973 than abstracting sequences. There are several reasons:
974 </para>
975 <orderedlist>
976 <listitem>
977 <para>
978 Iterators (even in self-organizing containers) are
979 useful for many purposes: cross-referencing
980 containers, serialization, and debugging code that uses
981 these containers.
982 </para>
983 </listitem>
984
985 <listitem>
986 <para>
987 The standard library's hash-based containers support
988 iterators, even though they too are self-organizing
989 containers with a different purpose than abstracting
990 sequences.
991 </para>
992 </listitem>
993
994 <listitem>
995 <para>
996 In standard-library-like containers, it is natural to specify the
997 interface of operations for modifying a value or erasing
998 a value (discussed previously) in terms of a iterators.
999 It should be noted that the standard
1000 containers also use iterators for accessing and
1001 manipulating a specific value. In hash-based
1002 containers, one checks the existence of a key by
1003 comparing the iterator returned by <function>find</function> to the
1004 iterator returned by <function>end</function>, and not by comparing a
1005 pointer returned by <function>find</function> to <type>NULL</type>.
1006 </para>
1007 </listitem>
1008 </orderedlist>
1009 </listitem>
1010 </orderedlist>
1011
1012 </section>
1013
1014 <section xml:id="motivation.priority_queue.underlying">
1015 <info><title>Underlying Data Structures</title></info>
1016
1017 <para>
1018 There are three main implementations of priority queues: the
1019 first employs a binary heap, typically one which uses a
1020 sequence; the second uses a tree (or forest of trees), which is
1021 typically less structured than an associative container's tree;
1022 the third simply uses an associative container. These are
1023 shown in the figure below with labels A1 and A2, B, and C.
1024 </para>
1025
1026 <figure>
1027 <title>Underlying Priority Queue Data Structures</title>
1028 <mediaobject>
1029 <imageobject>
1030 <imagedata align="center" format="PNG" scale="100"
1031 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_different_underlying_dss_2.png"/>
1032 </imageobject>
1033 <textobject>
1034 <phrase>Underlying Priority Queue Data Structures</phrase>
1035 </textobject>
1036 </mediaobject>
1037 </figure>
1038
1039 <para>
1040 No single implementation can completely replace any of the
1041 others. Some have better <function>push</function>
1042 and <function>pop</function> amortized performance, some have
1043 better bounded (worst case) response time than others, some
1044 optimize a single method at the expense of others, etc. In
1045 general the "best" implementation is dictated by the specific
1046 problem.
1047 </para>
1048
1049 <para>
1050 As with associative containers, the more implementations
1051 co-exist, the more necessary a traits mechanism is for handling
1052 generic containers safely and efficiently. This is especially
1053 important for priority queues, since the invalidation guarantees
1054 of one of the most useful data structures - binary heaps - is
1055 markedly different than those of most of the others.
1056 </para>
1057
1058 </section>
1059
1060 <section xml:id="motivation.priority_queue.binary_heap">
1061 <info><title>Binary Heaps</title></info>
1062
1063
1064 <para>
1065 Binary heaps are one of the most useful underlying
1066 data structures for priority queues. They are very efficient in
1067 terms of memory (since they don't require per-value structure
1068 metadata), and have the best amortized <function>push</function> and
1069 <function>pop</function> performance for primitive types like
1070 <type>int</type>.
1071 </para>
1072
1073 <para>
1074 The standard library's <classname>priority_queue</classname>
1075 implements this data structure as an adapter over a sequence,
1076 typically
1077 <classname>std::vector</classname>
1078 or <classname>std::deque</classname>, which correspond to labels
1079 A1 and A2 respectively in the graphic above.
1080 </para>
1081
1082 <para>
1083 This is indeed an elegant example of the adapter concept and
1084 the algorithm/container/iterator decomposition. (See <xref linkend="biblio.nelson96stlpq"/>). There are
1085 several reasons why a binary-heap priority queue
1086 may be better implemented as a container instead of a
1087 sequence adapter:
1088 </para>
1089
1090 <orderedlist>
1091 <listitem>
1092 <para>
1093 <classname>std::priority_queue</classname> cannot erase values
1094 from its adapted sequence (irrespective of the sequence
1095 type). This means that the memory use of
1096 an <classname>std::priority_queue</classname> object is always
1097 proportional to the maximal number of values it ever contained,
1098 and not to the number of values that it currently
1099 contains. (See <filename>performance/priority_queue_text_pop_mem_usage.cc</filename>.)
1100 This implementation of binary heaps acts very differently than
1101 other underlying data structures (See also pairing heaps).
1102 </para>
1103 </listitem>
1104
1105 <listitem>
1106 <para>
1107 Some combinations of adapted sequences and value types
1108 are very inefficient or just don't make sense. If one uses
1109 <classname>std::priority_queue&lt;std::vector&lt;std::string&gt;
1110 &gt; &gt;</classname>, for example, then not only will each
1111 operation perform a logarithmic number of
1112 <classname>std::string</classname> assignments, but, furthermore, any
1113 operation (including <function>pop</function>) can render the container
1114 useless due to exceptions. Conversely, if one uses
1115 <classname>std::priority_queue&lt;std::deque&lt;int&gt; &gt;
1116 &gt;</classname>, then each operation uses incurs a logarithmic
1117 number of indirect accesses (through pointers) unnecessarily.
1118 It might be better to let the container make a conservative
1119 deduction whether to use the structure in the graphic above, labels A1 or A2.
1120 </para>
1121 </listitem>
1122
1123 <listitem>
1124 <para>
1125 There does not seem to be a systematic way to determine
1126 what exactly can be done with the priority queue.
1127 </para>
1128 <orderedlist>
1129 <listitem>
1130 <para>
1131 If <classname>p</classname> is a priority queue adapting an
1132 <classname>std::vector</classname>, then it is possible to iterate over
1133 all values by using <function>&amp;p.top()</function> and
1134 <function>&amp;p.top() + p.size()</function>, but this will not work
1135 if <varname>p</varname> is adapting an <classname>std::deque</classname>; in any
1136 case, one cannot use <classname>p.begin()</classname> and
1137 <classname>p.end()</classname>. If a different sequence is adapted, it
1138 is even more difficult to determine what can be
1139 done.
1140 </para>
1141 </listitem>
1142
1143 <listitem>
1144 <para>
1145 If <varname>p</varname> is a priority queue adapting an
1146 <classname>std::deque</classname>, then the reference return by
1147 </para>
1148 <programlisting>
1149 p.top()
1150 </programlisting>
1151 <para>
1152 will remain valid until it is popped,
1153 but if <varname>p</varname> adapts an <classname>std::vector</classname>, the
1154 next <function>push</function> will invalidate it. If a different
1155 sequence is adapted, it is even more difficult to
1156 determine what can be done.
1157 </para>
1158 </listitem>
1159 </orderedlist>
1160 </listitem>
1161
1162 <listitem>
1163 <para>
1164 Sequence-based binary heaps can still implement
1165 linear-time <function>erase</function> and <function>modify</function> operations.
1166 This means that if one needs to erase a small
1167 (say logarithmic) number of values, then one might still
1168 choose this underlying data structure. Using
1169 <classname>std::priority_queue</classname>, however, this will generally
1170 change the order of growth of the entire sequence of
1171 operations.
1172 </para>
1173 </listitem>
1174 </orderedlist>
1175
1176 </section>
1177 </section>
1178 </section> <!-- goals/motivation -->
1179 </section> <!-- intro -->
1180
1181 <!-- S02: Using -->
1182 <section xml:id="containers.pbds.using">
1183 <info><title>Using</title></info>
1184 <?dbhtml filename="policy_data_structures_using.html"?>
1185
1186 <section xml:id="pbds.using.prereq">
1187 <info><title>Prerequisites</title></info>
1188
1189 <para>The library contains only header files, and does not require any
1190 other libraries except the standard C++ library . All classes are
1191 defined in namespace <code>__gnu_pbds</code>. The library internally
1192 uses macros beginning with <code>PB_DS</code>, but
1193 <code>#undef</code>s anything it <code>#define</code>s (except for
1194 header guards). Compiling the library in an environment where macros
1195 beginning in <code>PB_DS</code> are defined, may yield unpredictable
1196 results in compilation, execution, or both.</para>
1197
1198 <para>
1199 Further dependencies are necessary to create the visual output for the
1200 performance tests. To create these graphs, two additional packages
1201 will be needed: <command>pychart</command> and <command>Beautiful
1202 Soup</command>.
1203 </para>
1204 </section>
1205
1206 <section xml:id="pbds.using.organization">
1207 <info><title>Organization</title></info>
1208
1209 <para>
1210 The various data structures are organized as follows.
1211 </para>
1212
1213 <itemizedlist>
1214 <listitem>
1215 <para>
1216 Branch-Based
1217 </para>
1218
1219 <itemizedlist>
1220 <listitem>
1221 <para>
1222 <classname>basic_branch</classname>
1223 is an abstract base class for branched-based
1224 associative-containers
1225 </para>
1226 </listitem>
1227
1228 <listitem>
1229 <para>
1230 <classname>tree</classname>
1231 is a concrete base class for tree-based
1232 associative-containers
1233 </para>
1234 </listitem>
1235
1236 <listitem>
1237 <para>
1238 <classname>trie</classname>
1239 is a concrete base class trie-based
1240 associative-containers
1241 </para>
1242 </listitem>
1243 </itemizedlist>
1244 </listitem>
1245
1246 <listitem>
1247 <para>
1248 Hash-Based
1249 </para>
1250 <itemizedlist>
1251 <listitem>
1252 <para>
1253 <classname>basic_hash_table</classname>
1254 is an abstract base class for hash-based
1255 associative-containers
1256 </para>
1257 </listitem>
1258
1259 <listitem>
1260 <para>
1261 <classname>cc_hash_table</classname>
1262 is a concrete collision-chaining hash-based
1263 associative-containers
1264 </para>
1265 </listitem>
1266
1267 <listitem>
1268 <para>
1269 <classname>gp_hash_table</classname>
1270 is a concrete (general) probing hash-based
1271 associative-containers
1272 </para>
1273 </listitem>
1274 </itemizedlist>
1275 </listitem>
1276
1277 <listitem>
1278 <para>
1279 List-Based
1280 </para>
1281 <itemizedlist>
1282 <listitem>
1283 <para>
1284 <classname>list_update</classname>
1285 list-based update-policy associative container
1286 </para>
1287 </listitem>
1288 </itemizedlist>
1289 </listitem>
1290 <listitem>
1291 <para>
1292 Heap-Based
1293 </para>
1294 <itemizedlist>
1295 <listitem>
1296 <para>
1297 <classname>priority_queue</classname>
1298 A priority queue.
1299 </para>
1300 </listitem>
1301 </itemizedlist>
1302 </listitem>
1303 </itemizedlist>
1304
1305 <para>
1306 The hierarchy is composed naturally so that commonality is
1307 captured by base classes. Thus <function>operator[]</function>
1308 is defined at the base of any hierarchy, since all derived
1309 containers support it. Conversely <function>split</function> is
1310 defined in <classname>basic_branch</classname>, since only
1311 tree-like containers support it.
1312 </para>
1313
1314 <para>
1315 In addition, there are the following diagnostics classes,
1316 used to report errors specific to this library's data
1317 structures.
1318 </para>
1319
1320 <figure>
1321 <title>Exception Hierarchy</title>
1322 <mediaobject>
1323 <imageobject>
1324 <imagedata align="center" format="PDF" scale="75"
1325 fileref="../images/pbds_exception_hierarchy.pdf"/>
1326 </imageobject>
1327 <imageobject>
1328 <imagedata align="center" format="PNG" scale="100"
1329 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_exception_hierarchy.png"/>
1330 </imageobject>
1331 <textobject>
1332 <phrase>Exception Hierarchy</phrase>
1333 </textobject>
1334 </mediaobject>
1335 </figure>
1336
1337 </section>
1338
1339 <section xml:id="pbds.using.tutorial">
1340 <info><title>Tutorial</title></info>
1341
1342 <section xml:id="pbds.using.tutorial.basic">
1343 <info><title>Basic Use</title></info>
1344
1345 <para>
1346 For the most part, the policy-based containers containers in
1347 namespace <literal>__gnu_pbds</literal> have the same interface as
1348 the equivalent containers in the standard C++ library, except for
1349 the names used for the container classes themselves. For example,
1350 this shows basic operations on a collision-chaining hash-based
1351 container:
1352 </para>
1353 <programlisting>
1354 #include &lt;ext/pb_ds/assoc_container.h&gt;
1355
1356 int main()
1357 {
1358 __gnu_pbds::cc_hash_table&lt;int, char&gt; c;
1359 c[2] = 'b';
1360 assert(c.find(1) == c.end());
1361 };
1362 </programlisting>
1363
1364 <para>
1365 The container is called
1366 <classname>__gnu_pbds::cc_hash_table</classname> instead of
1367 <classname>std::unordered_map</classname>, since <quote>unordered
1368 map</quote> does not necessarily mean a hash-based map as implied by
1369 the C++ library (C++0x or TR1). For example, list-based associative
1370 containers, which are very useful for the construction of
1371 "multimaps," are also unordered.
1372 </para>
1373
1374 <para>This snippet shows a red-black tree based container:</para>
1375
1376 <programlisting>
1377 #include &lt;ext/pb_ds/assoc_container.h&gt;
1378
1379 int main()
1380 {
1381 __gnu_pbds::tree&lt;int, char&gt; c;
1382 c[2] = 'b';
1383 assert(c.find(2) != c.end());
1384 };
1385 </programlisting>
1386
1387 <para>The container is called <classname>tree</classname> instead of
1388 <classname>map</classname> since the underlying data structures are
1389 being named with specificity.
1390 </para>
1391
1392 <para>
1393 The member function naming convention is to strive to be the same as
1394 the equivalent member functions in other C++ standard library
1395 containers. The familiar methods are unchanged:
1396 <function>begin</function>, <function>end</function>,
1397 <function>size</function>, <function>empty</function>, and
1398 <function>clear</function>.
1399 </para>
1400
1401 <para>
1402 This isn't to say that things are exactly as one would expect, given
1403 the container requirments and interfaces in the C++ standard.
1404 </para>
1405
1406 <para>
1407 The names of containers' policies and policy accessors are
1408 different then the usual. For example, if <type>hash_type</type> is
1409 some type of hash-based container, then</para>
1410
1411 <programlisting>
1412 hash_type::hash_fn
1413 </programlisting>
1414
1415 <para>
1416 gives the type of its hash functor, and if <varname>obj</varname> is
1417 some hash-based container object, then
1418 </para>
1419
1420 <programlisting>
1421 obj.get_hash_fn()
1422 </programlisting>
1423
1424 <para>will return a reference to its hash-functor object.</para>
1425
1426
1427 <para>
1428 Similarly, if <type>tree_type</type> is some type of tree-based
1429 container, then
1430 </para>
1431
1432 <programlisting>
1433 tree_type::cmp_fn
1434 </programlisting>
1435
1436 <para>
1437 gives the type of its comparison functor, and if
1438 <varname>obj</varname> is some tree-based container object,
1439 then
1440 </para>
1441
1442 <programlisting>
1443 obj.get_cmp_fn()
1444 </programlisting>
1445
1446 <para>will return a reference to its comparison-functor object.</para>
1447
1448 <para>
1449 It would be nice to give names consistent with those in the existing
1450 C++ standard (inclusive of TR1). Unfortunately, these standard
1451 containers don't consistently name types and methods. For example,
1452 <classname>std::tr1::unordered_map</classname> uses
1453 <type>hasher</type> for the hash functor, but
1454 <classname>std::map</classname> uses <type>key_compare</type> for
1455 the comparison functor. Also, we could not find an accessor for
1456 <classname>std::tr1::unordered_map</classname>'s hash functor, but
1457 <classname>std::map</classname> uses <classname>compare</classname>
1458 for accessing the comparison functor.
1459 </para>
1460
1461 <para>
1462 Instead, <literal>__gnu_pbds</literal> attempts to be internally
1463 consistent, and uses standard-derived terminology if possible.
1464 </para>
1465
1466 <para>
1467 Another source of difference is in scope:
1468 <literal>__gnu_pbds</literal> contains more types of associative
1469 containers than the standard C++ library, and more opportunities
1470 to configure these new containers, since different types of
1471 associative containers are useful in different settings.
1472 </para>
1473
1474 <para>
1475 Namespace <literal>__gnu_pbds</literal> contains different classes for
1476 hash-based containers, tree-based containers, trie-based containers,
1477 and list-based containers.
1478 </para>
1479
1480 <para>
1481 Since associative containers share parts of their interface, they
1482 are organized as a class hierarchy.
1483 </para>
1484
1485 <para>Each type or method is defined in the most-common ancestor
1486 in which it makes sense.
1487 </para>
1488
1489 <para>For example, all associative containers support iteration
1490 expressed in the following form:
1491 </para>
1492
1493 <programlisting>
1494 const_iterator
1495 begin() const;
1496
1497 iterator
1498 begin();
1499
1500 const_iterator
1501 end() const;
1502
1503 iterator
1504 end();
1505 </programlisting>
1506
1507 <para>
1508 But not all containers contain or use hash functors. Yet, both
1509 collision-chaining and (general) probing hash-based associative
1510 containers have a hash functor, so
1511 <classname>basic_hash_table</classname> contains the interface:
1512 </para>
1513
1514 <programlisting>
1515 const hash_fn&amp;
1516 get_hash_fn() const;
1517
1518 hash_fn&amp;
1519 get_hash_fn();
1520 </programlisting>
1521
1522 <para>
1523 so all hash-based associative containers inherit the same
1524 hash-functor accessor methods.
1525 </para>
1526
1527 </section> <!--basic use -->
1528
1529 <section xml:id="pbds.using.tutorial.configuring">
1530 <info>
1531 <title>
1532 Configuring via Template Parameters
1533 </title>
1534 </info>
1535
1536 <para>
1537 In general, each of this library's containers is
1538 parametrized by more policies than those of the standard library. For
1539 example, the standard hash-based container is parametrized as
1540 follows:
1541 </para>
1542 <programlisting>
1543 template&lt;typename Key, typename Mapped, typename Hash,
1544 typename Pred, typename Allocator, bool Cache_Hashe_Code&gt;
1545 class unordered_map;
1546 </programlisting>
1547
1548 <para>
1549 and so can be configured by key type, mapped type, a functor
1550 that translates keys to unsigned integral types, an equivalence
1551 predicate, an allocator, and an indicator whether to store hash
1552 values with each entry. this library's collision-chaining
1553 hash-based container is parametrized as
1554 </para>
1555 <programlisting>
1556 template&lt;typename Key, typename Mapped, typename Hash_Fn,
1557 typename Eq_Fn, typename Comb_Hash_Fn,
1558 typename Resize_Policy, bool Store_Hash
1559 typename Allocator&gt;
1560 class cc_hash_table;
1561 </programlisting>
1562
1563 <para>
1564 and so can be configured by the first four types of
1565 <classname>std::tr1::unordered_map</classname>, then a
1566 policy for translating the key-hash result into a position
1567 within the table, then a policy by which the table resizes,
1568 an indicator whether to store hash values with each entry,
1569 and an allocator (which is typically the last template
1570 parameter in standard containers).
1571 </para>
1572
1573 <para>
1574 Nearly all policy parameters have default values, so this
1575 need not be considered for casual use. It is important to
1576 note, however, that hash-based containers' policies can
1577 dramatically alter their performance in different settings,
1578 and that tree-based containers' policies can make them
1579 useful for other purposes than just look-up.
1580 </para>
1581
1582
1583 <para>As opposed to associative containers, priority queues have
1584 relatively few configuration options. The priority queue is
1585 parametrized as follows:</para>
1586 <programlisting>
1587 template&lt;typename Value_Type, typename Cmp_Fn,typename Tag,
1588 typename Allocator&gt;
1589 class priority_queue;
1590 </programlisting>
1591
1592 <para>The <classname>Value_Type</classname>, <classname>Cmp_Fn</classname>, and
1593 <classname>Allocator</classname> parameters are the container's value type,
1594 comparison-functor type, and allocator type, respectively;
1595 these are very similar to the standard's priority queue. The
1596 <classname>Tag</classname> parameter is different: there are a number of
1597 pre-defined tag types corresponding to binary heaps, binomial
1598 heaps, etc., and <classname>Tag</classname> should be instantiated
1599 by one of them.</para>
1600
1601 <para>Note that as opposed to the
1602 <classname>std::priority_queue</classname>,
1603 <classname>__gnu_pbds::priority_queue</classname> is not a
1604 sequence-adapter; it is a regular container.</para>
1605
1606 </section>
1607
1608 <section xml:id="pbds.using.tutorial.traits">
1609 <info>
1610 <title>
1611 Querying Container Attributes
1612 </title>
1613 </info>
1614 <para></para>
1615
1616 <para>A containers underlying data structure
1617 affect their performance; Unfortunately, they can also affect
1618 their interface. When manipulating generically associative
1619 containers, it is often useful to be able to statically
1620 determine what they can support and what the cannot.
1621 </para>
1622
1623 <para>Happily, the standard provides a good solution to a similar
1624 problem - that of the different behavior of iterators. If
1625 <classname>It</classname> is an iterator, then
1626 </para>
1627 <programlisting>
1628 typename std::iterator_traits&lt;It&gt;::iterator_category
1629 </programlisting>
1630
1631 <para>is one of a small number of pre-defined tag classes, and
1632 </para>
1633 <programlisting>
1634 typename std::iterator_traits&lt;It&gt;::value_type
1635 </programlisting>
1636
1637 <para>is the value type to which the iterator "points".</para>
1638
1639 <para>
1640 Similarly, in this library, if <type>C</type> is a
1641 container, then <classname>container_traits</classname> is a
1642 trait class that stores information about the kind of
1643 container that is implemented.
1644 </para>
1645 <programlisting>
1646 typename container_traits&lt;C&gt;::container_category
1647 </programlisting>
1648 <para>
1649 is one of a small number of predefined tag structures that
1650 uniquely identifies the type of underlying data structure.
1651 </para>
1652
1653 <para>In most cases, however, the exact underlying data
1654 structure is not really important, but what is important is
1655 one of its other attributes: whether it guarantees storing
1656 elements by key order, for example. For this one can
1657 use</para>
1658 <programlisting>
1659 typename container_traits&lt;C&gt;::order_preserving
1660 </programlisting>
1661 <para>
1662 Also,
1663 </para>
1664 <programlisting>
1665 typename container_traits&lt;C&gt;::invalidation_guarantee
1666 </programlisting>
1667
1668 <para>is the container's invalidation guarantee. Invalidation
1669 guarantees are especially important regarding priority queues,
1670 since in this library's design, iterators are practically the
1671 only way to manipulate them.</para>
1672 </section>
1673
1674 <section xml:id="pbds.using.tutorial.point_range_iteration">
1675 <info>
1676 <title>
1677 Point and Range Iteration
1678 </title>
1679 </info>
1680 <para></para>
1681
1682 <para>This library differentiates between two types of methods
1683 and iterators: point-type, and range-type. For example,
1684 <function>find</function> and <function>insert</function> are point-type methods, since
1685 they each deal with a specific element; their returned
1686 iterators are point-type iterators. <function>begin</function> and
1687 <function>end</function> are range-type methods, since they are not used to
1688 find a specific element, but rather to go over all elements in
1689 a container object; their returned iterators are range-type
1690 iterators.
1691 </para>
1692
1693 <para>Most containers store elements in an order that is
1694 determined by their interface. Correspondingly, it is fine that
1695 their point-type iterators are synonymous with their range-type
1696 iterators. For example, in the following snippet
1697 </para>
1698 <programlisting>
1699 std::for_each(c.find(1), c.find(5), foo);
1700 </programlisting>
1701 <para>
1702 two point-type iterators (returned by <function>find</function>) are used
1703 for a range-type purpose - going over all elements whose key is
1704 between 1 and 5.
1705 </para>
1706
1707 <para>
1708 Conversely, the above snippet makes no sense for
1709 self-organizing containers - ones that order (and reorder)
1710 their elements by implementation. It would be nice to have a
1711 uniform iterator system that would allow the above snippet to
1712 compile only if it made sense.
1713 </para>
1714
1715 <para>
1716 This could trivially be done by specializing
1717 <function>std::for_each</function> for the case of iterators returned by
1718 <classname>std::tr1::unordered_map</classname>, but this would only solve the
1719 problem for one algorithm and one container. Fundamentally, the
1720 problem is that one can loop using a self-organizing
1721 container's point-type iterators.
1722 </para>
1723
1724 <para>
1725 This library's containers define two families of
1726 iterators: <type>point_const_iterator</type> and
1727 <type>point_iterator</type> are the iterator types returned by
1728 point-type methods; <type>const_iterator</type> and
1729 <type>iterator</type> are the iterator types returned by range-type
1730 methods.
1731 </para>
1732 <programlisting>
1733 class &lt;- some container -&gt;
1734 {
1735 public:
1736 ...
1737
1738 typedef &lt;- something -&gt; const_iterator;
1739
1740 typedef &lt;- something -&gt; iterator;
1741
1742 typedef &lt;- something -&gt; point_const_iterator;
1743
1744 typedef &lt;- something -&gt; point_iterator;
1745
1746 ...
1747
1748 public:
1749 ...
1750
1751 const_iterator begin () const;
1752
1753 iterator begin();
1754
1755 point_const_iterator find(...) const;
1756
1757 point_iterator find(...);
1758 };
1759 </programlisting>
1760
1761 <para>For
1762 containers whose interface defines sequence order , it
1763 is very simple: point-type and range-type iterators are exactly
1764 the same, which means that the above snippet will compile if it
1765 is used for an order-preserving associative container.
1766 </para>
1767
1768 <para>
1769 For self-organizing containers, however, (hash-based
1770 containers as a special example), the preceding snippet will
1771 not compile, because their point-type iterators do not support
1772 <function>operator++</function>.
1773 </para>
1774
1775 <para>In any case, both for order-preserving and self-organizing
1776 containers, the following snippet will compile:
1777 </para>
1778 <programlisting>
1779 typename Cntnr::point_iterator it = c.find(2);
1780 </programlisting>
1781
1782 <para>
1783 because a range-type iterator can always be converted to a
1784 point-type iterator.
1785 </para>
1786
1787 <para>Distingushing between iterator types also
1788 raises the point that a container's iterators might have
1789 different invalidation rules concerning their de-referencing
1790 abilities and movement abilities. This now corresponds exactly
1791 to the question of whether point-type and range-type iterators
1792 are valid. As explained above, <classname>container_traits</classname> allows
1793 querying a container for its data structure attributes. The
1794 iterator-invalidation guarantees are certainly a property of
1795 the underlying data structure, and so
1796 </para>
1797 <programlisting>
1798 container_traits&lt;C&gt;::invalidation_guarantee
1799 </programlisting>
1800
1801 <para>
1802 gives one of three pre-determined types that answer this
1803 query.
1804 </para>
1805
1806 </section>
1807 </section> <!-- tutorial -->
1808
1809 <section xml:id="pbds.using.examples">
1810 <info><title>Examples</title></info>
1811 <para>
1812 Additional code examples are provided in the source
1813 distribution, as part of the regression and performance
1814 testsuite.
1815 </para>
1816
1817 <section xml:id="pbds.using.examples.basic">
1818 <info><title>Intermediate Use</title></info>
1819
1820 <itemizedlist>
1821 <listitem>
1822 <para>
1823 Basic use of maps:
1824 <filename>basic_map.cc</filename>
1825 </para>
1826 </listitem>
1827
1828 <listitem>
1829 <para>
1830 Basic use of sets:
1831 <filename>basic_set.cc</filename>
1832 </para>
1833 </listitem>
1834
1835 <listitem>
1836 <para>
1837 Conditionally erasing values from an associative container object:
1838 <filename>erase_if.cc</filename>
1839 </para>
1840 </listitem>
1841
1842 <listitem>
1843 <para>
1844 Basic use of multimaps:
1845 <filename>basic_multimap.cc</filename>
1846 </para>
1847 </listitem>
1848
1849 <listitem>
1850 <para>
1851 Basic use of multisets:
1852 <filename>basic_multiset.cc</filename>
1853 </para>
1854 </listitem>
1855
1856 <listitem>
1857 <para>
1858 Basic use of priority queues:
1859 <filename>basic_priority_queue.cc</filename>
1860 </para>
1861 </listitem>
1862
1863 <listitem>
1864 <para>
1865 Splitting and joining priority queues:
1866 <filename>priority_queue_split_join.cc</filename>
1867 </para>
1868 </listitem>
1869
1870 <listitem>
1871 <para>
1872 Conditionally erasing values from a priority queue:
1873 <filename>priority_queue_erase_if.cc</filename>
1874 </para>
1875 </listitem>
1876 </itemizedlist>
1877
1878 </section>
1879
1880 <section xml:id="pbds.using.examples.query">
1881 <info><title>Querying with <classname>container_traits</classname> </title></info>
1882 <itemizedlist>
1883 <listitem>
1884 <para>
1885 Using <classname>container_traits</classname> to query
1886 about underlying data structure behavior:
1887 <filename>assoc_container_traits.cc</filename>
1888 </para>
1889 </listitem>
1890
1891 <listitem>
1892 <para>
1893 A non-compiling example showing wrong use of finding keys in
1894 hash-based containers: <filename>hash_find_neg.cc</filename>
1895 </para>
1896 </listitem>
1897 <listitem>
1898 <para>
1899 Using <classname>container_traits</classname>
1900 to query about underlying data structure behavior:
1901 <filename>priority_queue_container_traits.cc</filename>
1902 </para>
1903 </listitem>
1904
1905 </itemizedlist>
1906
1907 </section>
1908
1909 <section xml:id="pbds.using.examples.container">
1910 <info><title>By Container Method</title></info>
1911 <para></para>
1912
1913 <section xml:id="pbds.using.examples.container.hash">
1914 <info><title>Hash-Based</title></info>
1915
1916 <section xml:id="pbds.using.examples.container.hash.resize">
1917 <info><title>size Related</title></info>
1918
1919 <itemizedlist>
1920 <listitem>
1921 <para>
1922 Setting the initial size of a hash-based container
1923 object:
1924 <filename>hash_initial_size.cc</filename>
1925 </para>
1926 </listitem>
1927
1928 <listitem>
1929 <para>
1930 A non-compiling example showing how not to resize a
1931 hash-based container object:
1932 <filename>hash_resize_neg.cc</filename>
1933 </para>
1934 </listitem>
1935
1936 <listitem>
1937 <para>
1938 Resizing the size of a hash-based container object:
1939 <filename>hash_resize.cc</filename>
1940 </para>
1941 </listitem>
1942
1943 <listitem>
1944 <para>
1945 Showing an illegal resize of a hash-based container
1946 object:
1947 <filename>hash_illegal_resize.cc</filename>
1948 </para>
1949 </listitem>
1950
1951 <listitem>
1952 <para>
1953 Changing the load factors of a hash-based container
1954 object: <filename>hash_load_set_change.cc</filename>
1955 </para>
1956 </listitem>
1957 </itemizedlist>
1958 </section>
1959
1960 <section xml:id="pbds.using.examples.container.hash.hashor">
1961 <info><title>Hashing Function Related</title></info>
1962 <para></para>
1963
1964 <itemizedlist>
1965 <listitem>
1966 <para>
1967 Using a modulo range-hashing function for the case of an
1968 unknown skewed key distribution:
1969 <filename>hash_mod.cc</filename>
1970 </para>
1971 </listitem>
1972
1973 <listitem>
1974 <para>
1975 Writing a range-hashing functor for the case of a known
1976 skewed key distribution:
1977 <filename>shift_mask.cc</filename>
1978 </para>
1979 </listitem>
1980
1981 <listitem>
1982 <para>
1983 Storing the hash value along with each key:
1984 <filename>store_hash.cc</filename>
1985 </para>
1986 </listitem>
1987
1988 <listitem>
1989 <para>
1990 Writing a ranged-hash functor:
1991 <filename>ranged_hash.cc</filename>
1992 </para>
1993 </listitem>
1994 </itemizedlist>
1995
1996 </section>
1997
1998 </section>
1999
2000 <section xml:id="pbds.using.examples.container.branch">
2001 <info><title>Branch-Based</title></info>
2002
2003
2004 <section xml:id="pbds.using.examples.container.branch.split">
2005 <info><title>split or join Related</title></info>
2006
2007 <itemizedlist>
2008 <listitem>
2009 <para>
2010 Joining two tree-based container objects:
2011 <filename>tree_join.cc</filename>
2012 </para>
2013 </listitem>
2014
2015 <listitem>
2016 <para>
2017 Splitting a PATRICIA trie container object:
2018 <filename>trie_split.cc</filename>
2019 </para>
2020 </listitem>
2021
2022 <listitem>
2023 <para>
2024 Order statistics while joining two tree-based container
2025 objects:
2026 <filename>tree_order_statistics_join.cc</filename>
2027 </para>
2028 </listitem>
2029 </itemizedlist>
2030
2031 </section>
2032
2033 <section xml:id="pbds.using.examples.container.branch.invariants">
2034 <info><title>Node Invariants</title></info>
2035
2036 <itemizedlist>
2037 <listitem>
2038 <para>
2039 Using trees for order statistics:
2040 <filename>tree_order_statistics.cc</filename>
2041 </para>
2042 </listitem>
2043
2044 <listitem>
2045 <para>
2046 Augmenting trees to support operations on line
2047 intervals:
2048 <filename>tree_intervals.cc</filename>
2049 </para>
2050 </listitem>
2051 </itemizedlist>
2052
2053 </section>
2054
2055 <section xml:id="pbds.using.examples.container.branch.trie">
2056 <info><title>trie</title></info>
2057 <itemizedlist>
2058 <listitem>
2059 <para>
2060 Using a PATRICIA trie for DNA strings:
2061 <filename>trie_dna.cc</filename>
2062 </para>
2063 </listitem>
2064
2065 <listitem>
2066 <para>
2067 Using a PATRICIA
2068 trie for finding all entries whose key matches a given prefix:
2069 <filename>trie_prefix_search.cc</filename>
2070 </para>
2071 </listitem>
2072 </itemizedlist>
2073
2074 </section>
2075
2076 </section>
2077
2078 <section xml:id="pbds.using.examples.container.priority_queue">
2079 <info><title>Priority Queues</title></info>
2080 <itemizedlist>
2081 <listitem>
2082 <para>
2083 Cross referencing an associative container and a priority
2084 queue: <filename>priority_queue_xref.cc</filename>
2085 </para>
2086 </listitem>
2087
2088 <listitem>
2089 <para>
2090 Cross referencing a vector and a priority queue using a
2091 very simple version of Dijkstra's shortest path
2092 algorithm:
2093 <filename>priority_queue_dijkstra.cc</filename>
2094 </para>
2095 </listitem>
2096 </itemizedlist>
2097
2098 </section>
2099
2100
2101 </section>
2102
2103 </section>
2104
2105 </section> <!-- using -->
2106
2107 <!-- S03: Design -->
2108
2109
2110 <section xml:id="containers.pbds.design">
2111 <info><title>Design</title></info>
2112 <?dbhtml filename="policy_data_structures_design.html"?>
2113 <para></para>
2114
2115 <section xml:id="pbds.design.concepts">
2116 <info><title>Concepts</title></info>
2117
2118 <section xml:id="pbds.design.concepts.null_type">
2119 <info><title>Null Policy Classes</title></info>
2120
2121 <para>
2122 Associative containers are typically parametrized by various
2123 policies. For example, a hash-based associative container is
2124 parametrized by a hash-functor, transforming each key into an
2125 non-negative numerical type. Each such value is then further mapped
2126 into a position within the table. The mapping of a key into a
2127 position within the table is therefore a two-step process.
2128 </para>
2129
2130 <para>
2131 In some cases, instantiations are redundant. For example, when the
2132 keys are integers, it is possible to use a redundant hash policy,
2133 which transforms each key into its value.
2134 </para>
2135
2136 <para>
2137 In some other cases, these policies are irrelevant. For example, a
2138 hash-based associative container might transform keys into positions
2139 within a table by a different method than the two-step method
2140 described above. In such a case, the hash functor is simply
2141 irrelevant.
2142 </para>
2143
2144 <para>
2145 When a policy is either redundant or irrelevant, it can be replaced
2146 by <classname>null_type</classname>.
2147 </para>
2148
2149 <para>
2150 For example, a <emphasis>set</emphasis> is an associative
2151 container with one of its template parameters (the one for the
2152 mapped type) replaced with <classname>null_type</classname>. Other
2153 places simplifications are made possible with this technique
2154 include node updates in tree and trie data structures, and hash
2155 and probe functions for hash data structures.
2156 </para>
2157 </section>
2158
2159 <section xml:id="pbds.design.concepts.associative_semantics">
2160 <info><title>Map and Set Semantics</title></info>
2161
2162 <section xml:id="concepts.associative_semantics.set_vs_map">
2163 <info>
2164 <title>
2165 Distinguishing Between Maps and Sets
2166 </title>
2167 </info>
2168
2169 <para>
2170 Anyone familiar with the standard knows that there are four kinds
2171 of associative containers: maps, sets, multimaps, and
2172 multisets. The map datatype associates each key to
2173 some data.
2174 </para>
2175
2176 <para>
2177 Sets are associative containers that simply store keys -
2178 they do not map them to anything. In the standard, each map class
2179 has a corresponding set class. E.g.,
2180 <classname>std::map&lt;int, char&gt;</classname> maps each
2181 <classname>int</classname> to a <classname>char</classname>, but
2182 <classname>std::set&lt;int, char&gt;</classname> simply stores
2183 <classname>int</classname>s. In this library, however, there are no
2184 distinct classes for maps and sets. Instead, an associative
2185 container's <classname>Mapped</classname> template parameter is a policy: if
2186 it is instantiated by <classname>null_type</classname>, then it
2187 is a "set"; otherwise, it is a "map". E.g.,
2188 </para>
2189 <programlisting>
2190 cc_hash_table&lt;int, char&gt;
2191 </programlisting>
2192 <para>
2193 is a "map" mapping each <type>int</type> value to a <type>
2194 char</type>, but
2195 </para>
2196 <programlisting>
2197 cc_hash_table&lt;int, null_type&gt;
2198 </programlisting>
2199 <para>
2200 is a type that uniquely stores <type>int</type> values.
2201 </para>
2202 <para>Once the <classname>Mapped</classname> template parameter is instantiated
2203 by <classname>null_type</classname>, then
2204 the "set" acts very similarly to the standard's sets - it does not
2205 map each key to a distinct <classname>null_type</classname> object. Also,
2206 , the container's <type>value_type</type> is essentially
2207 its <type>key_type</type> - just as with the standard's sets
2208 .</para>
2209
2210 <para>
2211 The standard's multimaps and multisets allow, respectively,
2212 non-uniquely mapping keys and non-uniquely storing keys. As
2213 discussed, the
2214 reasons why this might be necessary are 1) that a key might be
2215 decomposed into a primary key and a secondary key, 2) that a
2216 key might appear more than once, or 3) any arbitrary
2217 combination of 1)s and 2)s. Correspondingly,
2218 one should use 1) "maps" mapping primary keys to secondary
2219 keys, 2) "maps" mapping keys to size types, or 3) any arbitrary
2220 combination of 1)s and 2)s. Thus, for example, an
2221 <classname>std::multiset&lt;int&gt;</classname> might be used to store
2222 multiple instances of integers, but using this library's
2223 containers, one might use
2224 </para>
2225 <programlisting>
2226 tree&lt;int, size_t&gt;
2227 </programlisting>
2228
2229 <para>
2230 i.e., a <classname>map</classname> of <type>int</type>s to
2231 <type>size_t</type>s.
2232 </para>
2233 <para>
2234 These "multimaps" and "multisets" might be confusing to
2235 anyone familiar with the standard's <classname>std::multimap</classname> and
2236 <classname>std::multiset</classname>, because there is no clear
2237 correspondence between the two. For example, in some cases
2238 where one uses <classname>std::multiset</classname> in the standard, one might use
2239 in this library a "multimap" of "multisets" - i.e., a
2240 container that maps primary keys each to an associative
2241 container that maps each secondary key to the number of times
2242 it occurs.
2243 </para>
2244
2245 <para>
2246 When one uses a "multimap," one should choose with care the
2247 type of container used for secondary keys.
2248 </para>
2249 </section> <!-- map vs set -->
2250
2251
2252 <section xml:id="concepts.associative_semantics.multi">
2253 <info><title>Alternatives to <classname>std::multiset</classname> and <classname>std::multimap</classname></title></info>
2254
2255 <para>
2256 Brace onself: this library does not contain containers like
2257 <classname>std::multimap</classname> or
2258 <classname>std::multiset</classname>. Instead, these data
2259 structures can be synthesized via manipulation of the
2260 <classname>Mapped</classname> template parameter.
2261 </para>
2262 <para>
2263 One maps the unique part of a key - the primary key, into an
2264 associative-container of the (originally) non-unique parts of
2265 the key - the secondary key. A primary associative-container
2266 is an associative container of primary keys; a secondary
2267 associative-container is an associative container of
2268 secondary keys.
2269 </para>
2270
2271 <para>
2272 Stepping back a bit, and starting in from the beginning.
2273 </para>
2274
2275
2276 <para>
2277 Maps (or sets) allow mapping (or storing) unique-key values.
2278 The standard library also supplies associative containers which
2279 map (or store) multiple values with equivalent keys:
2280 <classname>std::multimap</classname>, <classname>std::multiset</classname>,
2281 <classname>std::tr1::unordered_multimap</classname>, and
2282 <classname>unordered_multiset</classname>. We first discuss how these might
2283 be used, then why we think it is best to avoid them.
2284 </para>
2285
2286 <para>
2287 Suppose one builds a simple bank-account application that
2288 records for each client (identified by an <classname>std::string</classname>)
2289 and account-id (marked by an <type>unsigned long</type>) -
2290 the balance in the account (described by a
2291 <type>float</type>). Suppose further that ordering this
2292 information is not useful, so a hash-based container is
2293 preferable to a tree based container. Then one can use
2294 </para>
2295
2296 <programlisting>
2297 std::tr1::unordered_map&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2298 </programlisting>
2299
2300 <para>
2301 which hashes every combination of client and account-id. This
2302 might work well, except for the fact that it is now impossible
2303 to efficiently list all of the accounts of a specific client
2304 (this would practically require iterating over all
2305 entries). Instead, one can use
2306 </para>
2307
2308 <programlisting>
2309 std::tr1::unordered_multimap&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2310 </programlisting>
2311
2312 <para>
2313 which hashes every client, and decides equivalence based on
2314 client only. This will ensure that all accounts belonging to a
2315 specific user are stored consecutively.
2316 </para>
2317
2318 <para>
2319 Also, suppose one wants an integers' priority queue
2320 (a container that supports <function>push</function>,
2321 <function>pop</function>, and <function>top</function> operations, the last of which
2322 returns the largest <type>int</type>) that also supports
2323 operations such as <function>find</function> and <function>lower_bound</function>. A
2324 reasonable solution is to build an adapter over
2325 <classname>std::set&lt;int&gt;</classname>. In this adapter,
2326 <function>push</function> will just call the tree-based
2327 associative container's <function>insert</function> method; <function>pop</function>
2328 will call its <function>end</function> method, and use it to return the
2329 preceding element (which must be the largest). Then this might
2330 work well, except that the container object cannot hold
2331 multiple instances of the same integer (<function>push(4)</function>,
2332 will be a no-op if <constant>4</constant> is already in the
2333 container object). If multiple keys are necessary, then one
2334 might build the adapter over an
2335 <classname>std::multiset&lt;int&gt;</classname>.
2336 </para>
2337
2338 <para>
2339 The standard library's non-unique-mapping containers are useful
2340 when (1) a key can be decomposed in to a primary key and a
2341 secondary key, (2) a key is needed multiple times, or (3) any
2342 combination of (1) and (2).
2343 </para>
2344
2345 <para>
2346 The graphic below shows how the standard library's container
2347 design works internally; in this figure nodes shaded equally
2348 represent equivalent-key values. Equivalent keys are stored
2349 consecutively using the properties of the underlying data
2350 structure: binary search trees (label A) store equivalent-key
2351 values consecutively (in the sense of an in-order walk)
2352 naturally; collision-chaining hash tables (label B) store
2353 equivalent-key values in the same bucket, the bucket can be
2354 arranged so that equivalent-key values are consecutive.
2355 </para>
2356
2357 <figure>
2358 <title>Non-unique Mapping Standard Containers</title>
2359 <mediaobject>
2360 <imageobject>
2361 <imagedata align="center" format="PNG" scale="100"
2362 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_embedded_lists_1.png"/>
2363 </imageobject>
2364 <textobject>
2365 <phrase>Non-unique Mapping Standard Containers</phrase>
2366 </textobject>
2367 </mediaobject>
2368 </figure>
2369
2370 <para>
2371 Put differently, the standards' non-unique mapping
2372 associative-containers are associative containers that map
2373 primary keys to linked lists that are embedded into the
2374 container. The graphic below shows again the two
2375 containers from the first graphic above, this time with
2376 the embedded linked lists of the grayed nodes marked
2377 explicitly.
2378 </para>
2379
2380 <figure xml:id="fig.pbds_embedded_lists_2">
2381 <title>
2382 Effect of embedded lists in
2383 <classname>std::multimap</classname>
2384 </title>
2385 <mediaobject>
2386 <imageobject>
2387 <imagedata align="center" format="PNG" scale="100"
2388 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_embedded_lists_2.png"/>
2389 </imageobject>
2390 <textobject>
2391 <phrase>
2392 Effect of embedded lists in
2393 <classname>std::multimap</classname>
2394 </phrase>
2395 </textobject>
2396 </mediaobject>
2397 </figure>
2398
2399 <para>
2400 These embedded linked lists have several disadvantages.
2401 </para>
2402
2403 <orderedlist>
2404 <listitem>
2405 <para>
2406 The underlying data structure embeds the linked lists
2407 according to its own consideration, which means that the
2408 search path for a value might include several different
2409 equivalent-key values. For example, the search path for the
2410 the black node in either of the first graphic, labels A or B,
2411 includes more than a single gray node.
2412 </para>
2413 </listitem>
2414
2415 <listitem>
2416 <para>
2417 The links of the linked lists are the underlying data
2418 structures' nodes, which typically are quite structured. In
2419 the case of tree-based containers (the grapic above, label
2420 B), each "link" is actually a node with three pointers (one
2421 to a parent and two to children), and a
2422 relatively-complicated iteration algorithm. The linked
2423 lists, therefore, can take up quite a lot of memory, and
2424 iterating over all values equal to a given key (through the
2425 return value of the standard
2426 library's <function>equal_range</function>) can be
2427 expensive.
2428 </para>
2429 </listitem>
2430
2431 <listitem>
2432 <para>
2433 The primary key is stored multiply; this uses more memory.
2434 </para>
2435 </listitem>
2436
2437 <listitem>
2438 <para>
2439 Finally, the interface of this design excludes several
2440 useful underlying data structures. Of all the unordered
2441 self-organizing data structures, practically only
2442 collision-chaining hash tables can (efficiently) guarantee
2443 that equivalent-key values are stored consecutively.
2444 </para>
2445 </listitem>
2446 </orderedlist>
2447
2448 <para>
2449 The above reasons hold even when the ratio of secondary keys to
2450 primary keys (or average number of identical keys) is small, but
2451 when it is large, there are more severe problems:
2452 </para>
2453
2454 <orderedlist>
2455 <listitem>
2456 <para>
2457 The underlying data structures order the links inside each
2458 embedded linked-lists according to their internal
2459 considerations, which effectively means that each of the
2460 links is unordered. Irrespective of the underlying data
2461 structure, searching for a specific value can degrade to
2462 linear complexity.
2463 </para>
2464 </listitem>
2465
2466 <listitem>
2467 <para>
2468 Similarly to the above point, it is impossible to apply
2469 to the secondary keys considerations that apply to primary
2470 keys. For example, it is not possible to maintain secondary
2471 keys by sorted order.
2472 </para>
2473 </listitem>
2474
2475 <listitem>
2476 <para>
2477 While the interface "understands" that all equivalent-key
2478 values constitute a distinct list (through
2479 <function>equal_range</function>), the underlying data
2480 structure typically does not. This means that operations such
2481 as erasing from a tree-based container all values whose keys
2482 are equivalent to a a given key can be super-linear in the
2483 size of the tree; this is also true also for several other
2484 operations that target a specific list.
2485 </para>
2486 </listitem>
2487
2488 </orderedlist>
2489
2490 <para>
2491 In this library, all associative containers map
2492 (or store) unique-key values. One can (1) map primary keys to
2493 secondary associative-containers (containers of
2494 secondary keys) or non-associative containers (2) map identical
2495 keys to a size-type representing the number of times they
2496 occur, or (3) any combination of (1) and (2). Instead of
2497 allowing multiple equivalent-key values, this library
2498 supplies associative containers based on underlying
2499 data structures that are suitable as secondary
2500 associative-containers.
2501 </para>
2502
2503 <para>
2504 In the figure below, labels A and B show the equivalent
2505 underlying data structures in this library, as mapped to the
2506 first graphic above. Labels A and B, respectively. Each shaded
2507 box represents some size-type or secondary
2508 associative-container.
2509 </para>
2510
2511 <figure>
2512 <title>Non-unique Mapping Containers</title>
2513 <mediaobject>
2514 <imageobject>
2515 <imagedata align="center" format="PNG" scale="100"
2516 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_embedded_lists_3.png"/>
2517 </imageobject>
2518 <textobject>
2519 <phrase>Non-unique Mapping Containers</phrase>
2520 </textobject>
2521 </mediaobject>
2522 </figure>
2523
2524 <para>
2525 In the first example above, then, one would use an associative
2526 container mapping each user to an associative container which
2527 maps each application id to a start time (see
2528 <filename>example/basic_multimap.cc</filename>); in the second
2529 example, one would use an associative container mapping
2530 each <classname>int</classname> to some size-type indicating the
2531 number of times it logically occurs
2532 (see <filename>example/basic_multiset.cc</filename>.
2533 </para>
2534
2535 <para>
2536 See the discussion in list-based container types for containers
2537 especially suited as secondary associative-containers.
2538 </para>
2539 </section>
2540
2541 </section> <!-- map and set semantics -->
2542
2543 <section xml:id="pbds.design.concepts.iterator_semantics">
2544 <info><title>Iterator Semantics</title></info>
2545
2546 <section xml:id="concepts.iterator_semantics.point_and_range">
2547 <info><title>Point and Range Iterators</title></info>
2548
2549 <para>
2550 Iterator concepts are bifurcated in this design, and are
2551 comprised of point-type and range-type iteration.
2552 </para>
2553
2554 <para>
2555 A point-type iterator is an iterator that refers to a specific
2556 element as returned through an
2557 associative-container's <function>find</function> method.
2558 </para>
2559
2560 <para>
2561 A range-type iterator is an iterator that is used to go over a
2562 sequence of elements, as returned by a container's
2563 <function>find</function> method.
2564 </para>
2565
2566 <para>
2567 A point-type method is a method that
2568 returns a point-type iterator; a range-type method is a method
2569 that returns a range-type iterator.
2570 </para>
2571
2572 <para>For most containers, these types are synonymous; for
2573 self-organizing containers, such as hash-based containers or
2574 priority queues, these are inherently different (in any
2575 implementation, including that of C++ standard library
2576 components), but in this design, it is made explicit. They are
2577 distinct types.
2578 </para>
2579 </section>
2580
2581
2582 <section xml:id="concepts.iterator_semantics.both">
2583 <info><title>Distinguishing Point and Range Iterators</title></info>
2584
2585 <para>When using this library, is necessary to differentiate
2586 between two types of methods and iterators: point-type methods and
2587 iterators, and range-type methods and iterators. Each associative
2588 container's interface includes the methods:</para>
2589 <programlisting>
2590 point_const_iterator
2591 find(const_key_reference r_key) const;
2592
2593 point_iterator
2594 find(const_key_reference r_key);
2595
2596 std::pair&lt;point_iterator,bool&gt;
2597 insert(const_reference r_val);
2598 </programlisting>
2599
2600 <para>The relationship between these iterator types varies between
2601 container types. The figure below
2602 shows the most general invariant between point-type and
2603 range-type iterators: In <emphasis>A</emphasis> <literal>iterator</literal>, can
2604 always be converted to <literal>point_iterator</literal>. In <emphasis>B</emphasis>
2605 shows invariants for order-preserving containers: point-type
2606 iterators are synonymous with range-type iterators.
2607 Orthogonally, <emphasis>C</emphasis>shows invariants for "set"
2608 containers: iterators are synonymous with const iterators.</para>
2609
2610 <figure>
2611 <title>Point Iterator Hierarchy</title>
2612 <mediaobject>
2613 <imageobject>
2614 <imagedata align="center" format="PNG" scale="100"
2615 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_point_iterator_hierarchy.png"/>
2616 </imageobject>
2617 <textobject>
2618 <phrase>Point Iterator Hierarchy</phrase>
2619 </textobject>
2620 </mediaobject>
2621 </figure>
2622
2623
2624 <para>Note that point-type iterators in self-organizing containers
2625 (hash-based associative containers) lack movement
2626 operators, such as <literal>operator++</literal> - in fact, this
2627 is the reason why this library differentiates from the standard C++ librarys
2628 design on this point.</para>
2629
2630 <para>Typically, one can determine an iterator's movement
2631 capabilities using
2632 <literal>std::iterator_traits&lt;It&gt;iterator_category</literal>,
2633 which is a <literal>struct</literal> indicating the iterator's
2634 movement capabilities. Unfortunately, none of the standard predefined
2635 categories reflect a pointer's <emphasis>not</emphasis> having any
2636 movement capabilities whatsoever. Consequently,
2637 <literal>pb_ds</literal> adds a type
2638 <literal>trivial_iterator_tag</literal> (whose name is taken from
2639 a concept in C++ standardese, which is the category of iterators
2640 with no movement capabilities.) All other standard C++ library
2641 tags, such as <literal>forward_iterator_tag</literal> retain their
2642 common use.</para>
2643
2644 </section>
2645
2646 <section xml:id="pbds.design.concepts.invalidation">
2647 <info><title>Invalidation Guarantees</title></info>
2648 <para>
2649 If one manipulates a container object, then iterators previously
2650 obtained from it can be invalidated. In some cases a
2651 previously-obtained iterator cannot be de-referenced; in other cases,
2652 the iterator's next or previous element might have changed
2653 unpredictably. This corresponds exactly to the question whether a
2654 point-type or range-type iterator (see previous concept) is valid or
2655 not. In this design, one can query a container (in compile time) about
2656 its invalidation guarantees.
2657 </para>
2658
2659
2660 <para>
2661 Given three different types of associative containers, a modifying
2662 operation (in that example, <function>erase</function>) invalidated
2663 iterators in three different ways: the iterator of one container
2664 remained completely valid - it could be de-referenced and
2665 incremented; the iterator of a different container could not even be
2666 de-referenced; the iterator of the third container could be
2667 de-referenced, but its "next" iterator changed unpredictably.
2668 </para>
2669
2670 <para>
2671 Distinguishing between find and range types allows fine-grained
2672 invalidation guarantees, because these questions correspond exactly
2673 to the question of whether point-type iterators and range-type
2674 iterators are valid. The graphic below shows tags corresponding to
2675 different types of invalidation guarantees.
2676 </para>
2677
2678 <figure>
2679 <title>Invalidation Guarantee Tags Hierarchy</title>
2680 <mediaobject>
2681 <imageobject>
2682 <imagedata align="center" format="PDF" scale="75"
2683 fileref="../images/pbds_invalidation_tag_hierarchy.pdf"/>
2684 </imageobject>
2685 <imageobject>
2686 <imagedata align="center" format="PNG" scale="100"
2687 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_invalidation_tag_hierarchy.png"/>
2688 </imageobject>
2689 <textobject>
2690 <phrase>Invalidation Guarantee Tags Hierarchy</phrase>
2691 </textobject>
2692 </mediaobject>
2693 </figure>
2694
2695 <itemizedlist>
2696 <listitem>
2697 <para>
2698 <classname>basic_invalidation_guarantee</classname>
2699 corresponds to a basic guarantee that a point-type iterator,
2700 a found pointer, or a found reference, remains valid as long
2701 as the container object is not modified.
2702 </para>
2703 </listitem>
2704
2705 <listitem>
2706 <para>
2707 <classname>point_invalidation_guarantee</classname>
2708 corresponds to a guarantee that a point-type iterator, a
2709 found pointer, or a found reference, remains valid even if
2710 the container object is modified.
2711 </para>
2712 </listitem>
2713
2714 <listitem>
2715 <para>
2716 <classname>range_invalidation_guarantee</classname>
2717 corresponds to a guarantee that a range-type iterator remains
2718 valid even if the container object is modified.
2719 </para>
2720 </listitem>
2721 </itemizedlist>
2722
2723 <para>To find the invalidation guarantee of a
2724 container, one can use</para>
2725 <programlisting>
2726 typename container_traits&lt;Cntnr&gt;::invalidation_guarantee
2727 </programlisting>
2728
2729 <para>Note that this hierarchy corresponds to the logic it
2730 represents: if a container has range-invalidation guarantees,
2731 then it must also have find invalidation guarantees;
2732 correspondingly, its invalidation guarantee (in this case
2733 <classname>range_invalidation_guarantee</classname>)
2734 can be cast to its base class (in this case <classname>point_invalidation_guarantee</classname>).
2735 This means that this this hierarchy can be used easily using
2736 standard metaprogramming techniques, by specializing on the
2737 type of <literal>invalidation_guarantee</literal>.</para>
2738
2739 <para>
2740 These types of problems were addressed, in a more general
2741 setting, in <xref linkend="biblio.meyers96more"/> - Item 2. In
2742 our opinion, an invalidation-guarantee hierarchy would solve
2743 these problems in all container types - not just associative
2744 containers.
2745 </para>
2746
2747 </section>
2748 </section> <!-- iterator semantics -->
2749
2750 <section xml:id="pbds.design.concepts.genericity">
2751 <info><title>Genericity</title></info>
2752
2753 <para>
2754 The design attempts to address the following problem of
2755 data-structure genericity. When writing a function manipulating
2756 a generic container object, what is the behavior of the object?
2757 Suppose one writes
2758 </para>
2759 <programlisting>
2760 template&lt;typename Cntnr&gt;
2761 void
2762 some_op_sequence(Cntnr &amp;r_container)
2763 {
2764 ...
2765 }
2766 </programlisting>
2767
2768 <para>
2769 then one needs to address the following questions in the body
2770 of <function>some_op_sequence</function>:
2771 </para>
2772
2773 <itemizedlist>
2774 <listitem>
2775 <para>
2776 Which types and methods does <literal>Cntnr</literal> support?
2777 Containers based on hash tables can be queries for the
2778 hash-functor type and object; this is meaningless for tree-based
2779 containers. Containers based on trees can be split, joined, or
2780 can erase iterators and return the following iterator; this
2781 cannot be done by hash-based containers.
2782 </para>
2783 </listitem>
2784
2785 <listitem>
2786 <para>
2787 What are the exception and invalidation guarantees
2788 of <literal>Cntnr</literal>? A container based on a probing
2789 hash-table invalidates all iterators when it is modified; this
2790 is not the case for containers based on node-based
2791 trees. Containers based on a node-based tree can be split or
2792 joined without exceptions; this is not the case for containers
2793 based on vector-based trees.
2794 </para>
2795 </listitem>
2796
2797 <listitem>
2798 <para>
2799 How does the container maintain its elements? Tree-based and
2800 Trie-based containers store elements by key order; others,
2801 typically, do not. A container based on a splay trees or lists
2802 with update policies "cache" "frequently accessed" elements;
2803 containers based on most other underlying data structures do
2804 not.
2805 </para>
2806 </listitem>
2807 <listitem>
2808 <para>
2809 How does one query a container about characteristics and
2810 capabilities? What is the relationship between two different
2811 data structures, if anything?
2812 </para>
2813 </listitem>
2814 </itemizedlist>
2815
2816 <para>The remainder of this section explains these issues in
2817 detail.</para>
2818
2819
2820 <section xml:id="concepts.genericity.tag">
2821 <info><title>Tag</title></info>
2822 <para>
2823 Tags are very useful for manipulating generic types. For example, if
2824 <literal>It</literal> is an iterator class, then <literal>typename
2825 It::iterator_category</literal> or <literal>typename
2826 std::iterator_traits&lt;It&gt;::iterator_category</literal> will
2827 yield its category, and <literal>typename
2828 std::iterator_traits&lt;It&gt;::value_type</literal> will yield its
2829 value type.
2830 </para>
2831
2832 <para>
2833 This library contains a container tag hierarchy corresponding to the
2834 diagram below.
2835 </para>
2836
2837 <figure>
2838 <title>Container Tag Hierarchy</title>
2839 <mediaobject>
2840 <imageobject>
2841 <imagedata align="center" format="PDF" scale="75"
2842 fileref="../images/pbds_container_tag_hierarchy.pdf"/>
2843 </imageobject>
2844 <imageobject>
2845 <imagedata align="center" format="PNG" scale="100"
2846 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_container_tag_hierarchy.png"/>
2847 </imageobject>
2848 <textobject>
2849 <phrase>Container Tag Hierarchy</phrase>
2850 </textobject>
2851 </mediaobject>
2852 </figure>
2853
2854 <para>
2855 Given any container <type>Cntnr</type>, the tag of
2856 the underlying data structure can be found via <literal>typename
2857 Cntnr::container_category</literal>.
2858 </para>
2859
2860 </section> <!-- tag -->
2861
2862 <section xml:id="concepts.genericity.traits">
2863 <info><title>Traits</title></info>
2864 <para></para>
2865
2866 <para>Additionally, a traits mechanism can be used to query a
2867 container type for its attributes. Given any container
2868 <literal>Cntnr</literal>, then <literal>&lt;Cntnr&gt;</literal>
2869 is a traits class identifying the properties of the
2870 container.</para>
2871
2872 <para>To find if a container can throw when a key is erased (which
2873 is true for vector-based trees, for example), one can
2874 use
2875 </para>
2876 <programlisting>container_traits&lt;Cntnr&gt;::erase_can_throw</programlisting>
2877
2878 <para>
2879 Some of the definitions in <classname>container_traits</classname>
2880 are dependent on other
2881 definitions. If <classname>container_traits&lt;Cntnr&gt;::order_preserving</classname>
2882 is <constant>true</constant> (which is the case for containers
2883 based on trees and tries), then the container can be split or
2884 joined; in this
2885 case, <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2886 indicates whether splits or joins can throw exceptions (which is
2887 true for vector-based trees);
2888 otherwise <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2889 will yield a compilation error. (This is somewhat similar to a
2890 compile-time version of the COM model).
2891 </para>
2892
2893 </section> <!-- traits -->
2894
2895 </section> <!-- genericity -->
2896 </section> <!-- concepts -->
2897
2898 <section xml:id="pbds.design.container">
2899 <info><title>By Container</title></info>
2900
2901 <!-- hash -->
2902 <section xml:id="pbds.design.container.hash">
2903 <info><title>hash</title></info>
2904
2905 <!--
2906
2907 // hash policies
2908 /// general terms / background
2909 /// range hashing policies
2910 /// ranged-hash policies
2911 /// implementation
2912
2913 // resize policies
2914 /// general
2915 /// size policies
2916 /// trigger policies
2917 /// implementation
2918
2919 // policy interactions
2920 /// probe/size/trigger
2921 /// hash/trigger
2922 /// eq/hash/storing hash values
2923 /// size/load-check trigger
2924 -->
2925 <section xml:id="container.hash.interface">
2926 <info><title>Interface</title></info>
2927
2928
2929
2930 <para>
2931 The collision-chaining hash-based container has the
2932 following declaration.</para>
2933 <programlisting>
2934 template&lt;
2935 typename Key,
2936 typename Mapped,
2937 typename Hash_Fn = std::hash&lt;Key&gt;,
2938 typename Eq_Fn = std::equal_to&lt;Key&gt;,
2939 typename Comb_Hash_Fn = direct_mask_range_hashing&lt;&gt;
2940 typename Resize_Policy = default explained below.
2941 bool Store_Hash = false,
2942 typename Allocator = std::allocator&lt;char&gt; &gt;
2943 class cc_hash_table;
2944 </programlisting>
2945
2946 <para>The parameters have the following meaning:</para>
2947
2948 <orderedlist>
2949 <listitem><para><classname>Key</classname> is the key type.</para></listitem>
2950
2951 <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
2952
2953 <listitem><para><classname>Hash_Fn</classname> is a key hashing functor.</para></listitem>
2954
2955 <listitem><para><classname>Eq_Fn</classname> is a key equivalence functor.</para></listitem>
2956
2957 <listitem><para><classname>Comb_Hash_Fn</classname> is a range-hashing_functor;
2958 it describes how to translate hash values into positions
2959 within the table. </para></listitem>
2960
2961 <listitem><para><classname>Resize_Policy</classname> describes how a container object
2962 should change its internal size. </para></listitem>
2963
2964 <listitem><para><classname>Store_Hash</classname> indicates whether the hash value
2965 should be stored with each entry. </para></listitem>
2966
2967 <listitem><para><classname>Allocator</classname> is an allocator
2968 type.</para></listitem>
2969 </orderedlist>
2970
2971 <para>The probing hash-based container has the following
2972 declaration.</para>
2973 <programlisting>
2974 template&lt;
2975 typename Key,
2976 typename Mapped,
2977 typename Hash_Fn = std::hash&lt;Key&gt;,
2978 typename Eq_Fn = std::equal_to&lt;Key&gt;,
2979 typename Comb_Probe_Fn = direct_mask_range_hashing&lt;&gt;
2980 typename Probe_Fn = default explained below.
2981 typename Resize_Policy = default explained below.
2982 bool Store_Hash = false,
2983 typename Allocator = std::allocator&lt;char&gt; &gt;
2984 class gp_hash_table;
2985 </programlisting>
2986
2987 <para>The parameters are identical to those of the
2988 collision-chaining container, except for the following.</para>
2989
2990 <orderedlist>
2991 <listitem><para><classname>Comb_Probe_Fn</classname> describes how to transform a probe
2992 sequence into a sequence of positions within the table.</para></listitem>
2993
2994 <listitem><para><classname>Probe_Fn</classname> describes a probe sequence policy.</para></listitem>
2995 </orderedlist>
2996
2997 <para>Some of the default template values depend on the values of
2998 other parameters, and are explained below.</para>
2999
3000 </section>
3001 <section xml:id="container.hash.details">
3002 <info><title>Details</title></info>
3003
3004 <section xml:id="container.hash.details.hash_policies">
3005 <info><title>Hash Policies</title></info>
3006
3007 <section xml:id="details.hash_policies.general">
3008 <info><title>General</title></info>
3009
3010 <para>Following is an explanation of some functions which hashing
3011 involves. The graphic below illustrates the discussion.</para>
3012
3013 <figure>
3014 <title>Hash functions, ranged-hash functions, and
3015 range-hashing functions</title>
3016 <mediaobject>
3017 <imageobject>
3018 <imagedata align="center" format="PNG" scale="100"
3019 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_ranged_hash_range_hashing_fns.png"/>
3020 </imageobject>
3021 <textobject>
3022 <phrase>Hash functions, ranged-hash functions, and
3023 range-hashing functions</phrase>
3024 </textobject>
3025 </mediaobject>
3026 </figure>
3027
3028 <para>Let U be a domain (e.g., the integers, or the
3029 strings of 3 characters). A hash-table algorithm needs to map
3030 elements of U "uniformly" into the range [0,..., m -
3031 1] (where m is a non-negative integral value, and
3032 is, in general, time varying). I.e., the algorithm needs
3033 a ranged-hash function</para>
3034
3035 <para>
3036 f : U × Z<subscript>+</subscript> → Z<subscript>+</subscript>
3037 </para>
3038
3039 <para>such that for any u in U ,</para>
3040
3041 <para>0 ≤ f(u, m) ≤ m - 1</para>
3042
3043 <para>and which has "good uniformity" properties (say
3044 <xref linkend="biblio.knuth98sorting"/>.)
3045 One
3046 common solution is to use the composition of the hash
3047 function</para>
3048
3049 <para>h : U → Z<subscript>+</subscript> ,</para>
3050
3051 <para>which maps elements of U into the non-negative
3052 integrals, and</para>
3053
3054 <para>g : Z<subscript>+</subscript> × Z<subscript>+</subscript> →
3055 Z<subscript>+</subscript>,</para>
3056
3057 <para>which maps a non-negative hash value, and a non-negative
3058 range upper-bound into a non-negative integral in the range
3059 between 0 (inclusive) and the range upper bound (exclusive),
3060 i.e., for any r in Z<subscript>+</subscript>,</para>
3061
3062 <para>0 ≤ g(r, m) ≤ m - 1</para>
3063
3064
3065 <para>The resulting ranged-hash function, is</para>
3066
3067 <!-- ranged_hash_composed_of_hash_and_range_hashing -->
3068 <equation>
3069 <title>Ranged Hash Function</title>
3070 <mathphrase>
3071 f(u , m) = g(h(u), m)
3072 </mathphrase>
3073 </equation>
3074
3075 <para>From the above, it is obvious that given g and
3076 h, f can always be composed (however the converse
3077 is not true). The standard's hash-based containers allow specifying
3078 a hash function, and use a hard-wired range-hashing function;
3079 the ranged-hash function is implicitly composed.</para>
3080
3081 <para>The above describes the case where a key is to be mapped
3082 into a single position within a hash table, e.g.,
3083 in a collision-chaining table. In other cases, a key is to be
3084 mapped into a sequence of positions within a table,
3085 e.g., in a probing table. Similar terms apply in this
3086 case: the table requires a ranged probe function,
3087 mapping a key into a sequence of positions withing the table.
3088 This is typically achieved by composing a hash function
3089 mapping the key into a non-negative integral type, a
3090 probe function transforming the hash value into a
3091 sequence of hash values, and a range-hashing function
3092 transforming the sequence of hash values into a sequence of
3093 positions.</para>
3094
3095 </section>
3096
3097 <section xml:id="details.hash_policies.range">
3098 <info><title>Range Hashing</title></info>
3099
3100 <para>Some common choices for range-hashing functions are the
3101 division, multiplication, and middle-square methods (<xref linkend="biblio.knuth98sorting"/>), defined
3102 as</para>
3103
3104 <equation>
3105 <title>Range-Hashing, Division Method</title>
3106 <mathphrase>
3107 g(r, m) = r mod m
3108 </mathphrase>
3109 </equation>
3110
3111
3112
3113 <para>g(r, m) = ⌈ u/v ( a r mod v ) ⌉</para>
3114
3115 <para>and</para>
3116
3117 <para>g(r, m) = ⌈ u/v ( r<superscript>2</superscript> mod v ) ⌉</para>
3118
3119 <para>respectively, for some positive integrals u and
3120 v (typically powers of 2), and some a. Each of
3121 these range-hashing functions works best for some different
3122 setting.</para>
3123
3124 <para>The division method (see above) is a
3125 very common choice. However, even this single method can be
3126 implemented in two very different ways. It is possible to
3127 implement using the low
3128 level % (modulo) operation (for any m), or the
3129 low level &amp; (bit-mask) operation (for the case where
3130 m is a power of 2), i.e.,</para>
3131
3132 <equation>
3133 <title>Division via Prime Modulo</title>
3134 <mathphrase>
3135 g(r, m) = r % m
3136 </mathphrase>
3137 </equation>
3138
3139 <para>and</para>
3140
3141 <equation>
3142 <title>Division via Bit Mask</title>
3143 <mathphrase>
3144 g(r, m) = r &amp; m - 1, (with m =
3145 2<superscript>k</superscript> for some k)
3146 </mathphrase>
3147 </equation>
3148
3149
3150 <para>respectively.</para>
3151
3152 <para>The % (modulo) implementation has the advantage that for
3153 m a prime far from a power of 2, g(r, m) is
3154 affected by all the bits of r (minimizing the chance of
3155 collision). It has the disadvantage of using the costly modulo
3156 operation. This method is hard-wired into SGI's implementation
3157 .</para>
3158
3159 <para>The &amp; (bit-mask) implementation has the advantage of
3160 relying on the fast bit-wise and operation. It has the
3161 disadvantage that for g(r, m) is affected only by the
3162 low order bits of r. This method is hard-wired into
3163 Dinkumware's implementation.</para>
3164
3165
3166 </section>
3167
3168 <section xml:id="details.hash_policies.ranged">
3169 <info><title>Ranged Hash</title></info>
3170
3171 <para>In cases it is beneficial to allow the
3172 client to directly specify a ranged-hash hash function. It is
3173 true, that the writer of the ranged-hash function cannot rely
3174 on the values of m having specific numerical properties
3175 suitable for hashing (in the sense used in <xref linkend="biblio.knuth98sorting"/>), since
3176 the values of m are determined by a resize policy with
3177 possibly orthogonal considerations.</para>
3178
3179 <para>There are two cases where a ranged-hash function can be
3180 superior. The firs is when using perfect hashing: the
3181 second is when the values of m can be used to estimate
3182 the "general" number of distinct values required. This is
3183 described in the following.</para>
3184
3185 <para>Let</para>
3186
3187 <para>
3188 s = [ s<subscript>0</subscript>,..., s<subscript>t - 1</subscript>]
3189 </para>
3190
3191 <para>be a string of t characters, each of which is from
3192 domain S. Consider the following ranged-hash
3193 function:</para>
3194 <equation>
3195 <title>
3196 A Standard String Hash Function
3197 </title>
3198 <mathphrase>
3199 f<subscript>1</subscript>(s, m) = ∑ <subscript>i =
3200 0</subscript><superscript>t - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3201 </mathphrase>
3202 </equation>
3203
3204
3205 <para>where a is some non-negative integral value. This is
3206 the standard string-hashing function used in SGI's
3207 implementation (with a = 5). Its advantage is that
3208 it takes into account all of the characters of the string.</para>
3209
3210 <para>Now assume that s is the string representation of a
3211 of a long DNA sequence (and so S = {'A', 'C', 'G',
3212 'T'}). In this case, scanning the entire string might be
3213 prohibitively expensive. A possible alternative might be to use
3214 only the first k characters of the string, where</para>
3215
3216 <para>|S|<superscript>k</superscript> ≥ m ,</para>
3217
3218 <para>i.e., using the hash function</para>
3219
3220 <equation>
3221 <title>
3222 Only k String DNA Hash
3223 </title>
3224 <mathphrase>
3225 f<subscript>2</subscript>(s, m) = ∑ <subscript>i
3226 = 0</subscript><superscript>k - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3227 </mathphrase>
3228 </equation>
3229
3230 <para>requiring scanning over only</para>
3231
3232 <para>k = log<subscript>4</subscript>( m )</para>
3233
3234 <para>characters.</para>
3235
3236 <para>Other more elaborate hash-functions might scan k
3237 characters starting at a random position (determined at each
3238 resize), or scanning k random positions (determined at
3239 each resize), i.e., using</para>
3240
3241 <para>f<subscript>3</subscript>(s, m) = ∑ <subscript>i =
3242 r</subscript>0<superscript>r<subscript>0</subscript> + k - 1</superscript> s<subscript>i</subscript>
3243 a<superscript>i</superscript> mod m ,</para>
3244
3245 <para>or</para>
3246
3247 <para>f<subscript>4</subscript>(s, m) = ∑ <subscript>i = 0</subscript><superscript>k -
3248 1</superscript> s<subscript>r</subscript>i a<superscript>r<subscript>i</subscript></superscript> mod
3249 m ,</para>
3250
3251 <para>respectively, for r<subscript>0</subscript>,..., r<subscript>k-1</subscript>
3252 each in the (inclusive) range [0,...,t-1].</para>
3253
3254 <para>It should be noted that the above functions cannot be
3255 decomposed as per a ranged hash composed of hash and range hashing.</para>
3256
3257
3258 </section>
3259
3260 <section xml:id="details.hash_policies.implementation">
3261 <info><title>Implementation</title></info>
3262
3263 <para>This sub-subsection describes the implementation of
3264 the above in this library. It first explains range-hashing
3265 functions in collision-chaining tables, then ranged-hash
3266 functions in collision-chaining tables, then probing-based
3267 tables, and finally lists the relevant classes in this
3268 library.</para>
3269
3270 <section xml:id="hash_policies.implementation.collision-chaining">
3271 <info><title>
3272 Range-Hashing and Ranged-Hashes in Collision-Chaining Tables
3273 </title></info>
3274
3275
3276 <para><classname>cc_hash_table</classname> is
3277 parametrized by <classname>Hash_Fn</classname> and <classname>Comb_Hash_Fn</classname>, a
3278 hash functor and a combining hash functor, respectively.</para>
3279
3280 <para>In general, <classname>Comb_Hash_Fn</classname> is considered a
3281 range-hashing functor. <classname>cc_hash_table</classname>
3282 synthesizes a ranged-hash function from <classname>Hash_Fn</classname> and
3283 <classname>Comb_Hash_Fn</classname>. The figure below shows an <classname>insert</classname> sequence
3284 diagram for this case. The user inserts an element (point A),
3285 the container transforms the key into a non-negative integral
3286 using the hash functor (points B and C), and transforms the
3287 result into a position using the combining functor (points D
3288 and E).</para>
3289
3290 <figure>
3291 <title>Insert hash sequence diagram</title>
3292 <mediaobject>
3293 <imageobject>
3294 <imagedata align="center" format="PNG" scale="100"
3295 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_range_hashing_seq_diagram.png"/>
3296 </imageobject>
3297 <textobject>
3298 <phrase>Insert hash sequence diagram</phrase>
3299 </textobject>
3300 </mediaobject>
3301 </figure>
3302
3303 <para>If <classname>cc_hash_table</classname>'s
3304 hash-functor, <classname>Hash_Fn</classname> is instantiated by <classname>null_type</classname> , then <classname>Comb_Hash_Fn</classname> is taken to be
3305 a ranged-hash function. The graphic below shows an <function>insert</function> sequence
3306 diagram. The user inserts an element (point A), the container
3307 transforms the key into a position using the combining functor
3308 (points B and C).</para>
3309
3310 <figure>
3311 <title>Insert hash sequence diagram with a null policy</title>
3312 <mediaobject>
3313 <imageobject>
3314 <imagedata align="center" format="PNG" scale="100"
3315 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_range_hashing_seq_diagram2.png"/>
3316 </imageobject>
3317 <textobject>
3318 <phrase>Insert hash sequence diagram with a null policy</phrase>
3319 </textobject>
3320 </mediaobject>
3321 </figure>
3322
3323 </section>
3324
3325 <section xml:id="hash_policies.implementation.probe">
3326 <info><title>
3327 Probing tables
3328 </title></info>
3329 <para><classname>gp_hash_table</classname> is parametrized by
3330 <classname>Hash_Fn</classname>, <classname>Probe_Fn</classname>,
3331 and <classname>Comb_Probe_Fn</classname>. As before, if
3332 <classname>Hash_Fn</classname> and <classname>Probe_Fn</classname>
3333 are both <classname>null_type</classname>, then
3334 <classname>Comb_Probe_Fn</classname> is a ranged-probe
3335 functor. Otherwise, <classname>Hash_Fn</classname> is a hash
3336 functor, <classname>Probe_Fn</classname> is a functor for offsets
3337 from a hash value, and <classname>Comb_Probe_Fn</classname>
3338 transforms a probe sequence into a sequence of positions within
3339 the table.</para>
3340
3341 </section>
3342
3343 <section xml:id="hash_policies.implementation.predefined">
3344 <info><title>
3345 Pre-Defined Policies
3346 </title></info>
3347
3348 <para>This library contains some pre-defined classes
3349 implementing range-hashing and probing functions:</para>
3350
3351 <orderedlist>
3352 <listitem><para><classname>direct_mask_range_hashing</classname>
3353 and <classname>direct_mod_range_hashing</classname>
3354 are range-hashing functions based on a bit-mask and a modulo
3355 operation, respectively.</para></listitem>
3356
3357 <listitem><para><classname>linear_probe_fn</classname>, and
3358 <classname>quadratic_probe_fn</classname> are
3359 a linear probe and a quadratic probe function,
3360 respectively.</para></listitem>
3361 </orderedlist>
3362
3363 <para>
3364 The graphic below shows the relationships.
3365 </para>
3366 <figure>
3367 <title>Hash policy class diagram</title>
3368 <mediaobject>
3369 <imageobject>
3370 <imagedata align="center" format="PNG" scale="100"
3371 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_policy_cd.png"/>
3372 </imageobject>
3373 <textobject>
3374 <phrase>Hash policy class diagram</phrase>
3375 </textobject>
3376 </mediaobject>
3377 </figure>
3378
3379
3380 </section>
3381
3382 </section> <!-- impl -->
3383
3384 </section>
3385
3386 <section xml:id="container.hash.details.resize_policies">
3387 <info><title>Resize Policies</title></info>
3388
3389 <section xml:id="resize_policies.general">
3390 <info><title>General</title></info>
3391
3392 <para>Hash-tables, as opposed to trees, do not naturally grow or
3393 shrink. It is necessary to specify policies to determine how
3394 and when a hash table should change its size. Usually, resize
3395 policies can be decomposed into orthogonal policies:</para>
3396
3397 <orderedlist>
3398 <listitem><para>A size policy indicating how a hash table
3399 should grow (e.g., it should multiply by powers of
3400 2).</para></listitem>
3401
3402 <listitem><para>A trigger policy indicating when a hash
3403 table should grow (e.g., a load factor is
3404 exceeded).</para></listitem>
3405 </orderedlist>
3406
3407 </section>
3408
3409 <section xml:id="resize_policies.size">
3410 <info><title>Size Policies</title></info>
3411
3412
3413 <para>Size policies determine how a hash table changes size. These
3414 policies are simple, and there are relatively few sensible
3415 options. An exponential-size policy (with the initial size and
3416 growth factors both powers of 2) works well with a mask-based
3417 range-hashing function, and is the
3418 hard-wired policy used by Dinkumware. A
3419 prime-list based policy works well with a modulo-prime range
3420 hashing function and is the hard-wired policy used by SGI's
3421 implementation.</para>
3422
3423 </section>
3424
3425 <section xml:id="resize_policies.trigger">
3426 <info><title>Trigger Policies</title></info>
3427
3428 <para>Trigger policies determine when a hash table changes size.
3429 Following is a description of two policies: load-check
3430 policies, and collision-check policies.</para>
3431
3432 <para>Load-check policies are straightforward. The user specifies
3433 two factors, Α<subscript>min</subscript> and
3434 Α<subscript>max</subscript>, and the hash table maintains the
3435 invariant that</para>
3436
3437 <para>Α<subscript>min</subscript> ≤ (number of
3438 stored elements) / (hash-table size) ≤
3439 Α<subscript>max</subscript><remark>load factor min max</remark></para>
3440
3441 <para>Collision-check policies work in the opposite direction of
3442 load-check policies. They focus on keeping the number of
3443 collisions moderate and hoping that the size of the table will
3444 not grow very large, instead of keeping a moderate load-factor
3445 and hoping that the number of collisions will be small. A
3446 maximal collision-check policy resizes when the longest
3447 probe-sequence grows too large.</para>
3448
3449 <para>Consider the graphic below. Let the size of the hash table
3450 be denoted by m, the length of a probe sequence be denoted by k,
3451 and some load factor be denoted by Α. We would like to
3452 calculate the minimal length of k, such that if there were Α
3453 m elements in the hash table, a probe sequence of length k would
3454 be found with probability at most 1/m.</para>
3455
3456 <figure>
3457 <title>Balls and bins</title>
3458 <mediaobject>
3459 <imageobject>
3460 <imagedata align="center" format="PNG" scale="100"
3461 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_balls_and_bins.png"/>
3462 </imageobject>
3463 <textobject>
3464 <phrase>Balls and bins</phrase>
3465 </textobject>
3466 </mediaobject>
3467 </figure>
3468
3469 <para>Denote the probability that a probe sequence of length
3470 k appears in bin i by p<subscript>i</subscript>, the
3471 length of the probe sequence of bin i by
3472 l<subscript>i</subscript>, and assume uniform distribution. Then</para>
3473
3474
3475
3476 <equation>
3477 <title>
3478 Probability of Probe Sequence of Length k
3479 </title>
3480 <mathphrase>
3481 p<subscript>1</subscript> =
3482 </mathphrase>
3483 </equation>
3484
3485 <para>P(l<subscript>1</subscript> ≥ k) =</para>
3486
3487 <para>
3488 P(l<subscript>1</subscript> ≥ α ( 1 + k / α - 1) ≤ (a)
3489 </para>
3490
3491 <para>
3492 e ^ ( - ( α ( k / α - 1 )<superscript>2</superscript> ) /2)
3493 </para>
3494
3495 <para>where (a) follows from the Chernoff bound (<xref linkend="biblio.motwani95random"/>). To
3496 calculate the probability that some bin contains a probe
3497 sequence greater than k, we note that the
3498 l<subscript>i</subscript> are negatively-dependent
3499 (<xref linkend="biblio.dubhashi98neg"/>)
3500 . Let
3501 I(.) denote the indicator function. Then</para>
3502
3503 <equation>
3504 <title>
3505 Probability Probe Sequence in Some Bin
3506 </title>
3507 <mathphrase>
3508 P( exists<subscript>i</subscript> l<subscript>i</subscript> ≥ k ) =
3509 </mathphrase>
3510 </equation>
3511
3512 <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript>
3513 I(l<subscript>i</subscript> ≥ k) ≥ 1 ) =</para>
3514
3515 <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript> I (
3516 l<subscript>i</subscript> ≥ k ) ≥ m p<subscript>1</subscript> ( 1 + 1 / (m
3517 p<subscript>1</subscript>) - 1 ) ) ≤ (a)</para>
3518
3519 <para>e ^ ( ( - m p<subscript>1</subscript> ( 1 / (m p<subscript>1</subscript>)
3520 - 1 ) <superscript>2</superscript> ) / 2 ) ,</para>
3521
3522 <para>where (a) follows from the fact that the Chernoff bound can
3523 be applied to negatively-dependent variables (<xref
3524 linkend="biblio.dubhashi98neg"/>). Inserting the first probability
3525 equation into the second one, and equating with 1/m, we
3526 obtain</para>
3527
3528
3529 <para>k ~ √ ( 2 α ln 2 m ln(m) )
3530 ) .</para>
3531
3532 </section>
3533
3534 <section xml:id="resize_policies.impl">
3535 <info><title>Implementation</title></info>
3536
3537 <para>This sub-subsection describes the implementation of the
3538 above in this library. It first describes resize policies and
3539 their decomposition into trigger and size policies, then
3540 describes pre-defined classes, and finally discusses controlled
3541 access the policies' internals.</para>
3542
3543 <section xml:id="resize_policies.impl.decomposition">
3544 <info><title>Decomposition</title></info>
3545
3546
3547 <para>Each hash-based container is parametrized by a
3548 <classname>Resize_Policy</classname> parameter; the container derives
3549 <classname>public</classname>ly from <classname>Resize_Policy</classname>. For
3550 example:</para>
3551 <programlisting>
3552 cc_hash_table&lt;typename Key,
3553 typename Mapped,
3554 ...
3555 typename Resize_Policy
3556 ...&gt; : public Resize_Policy
3557 </programlisting>
3558
3559 <para>As a container object is modified, it continuously notifies
3560 its <classname>Resize_Policy</classname> base of internal changes
3561 (e.g., collisions encountered and elements being
3562 inserted). It queries its <classname>Resize_Policy</classname> base whether
3563 it needs to be resized, and if so, to what size.</para>
3564
3565 <para>The graphic below shows a (possible) sequence diagram
3566 of an insert operation. The user inserts an element; the hash
3567 table notifies its resize policy that a search has started
3568 (point A); in this case, a single collision is encountered -
3569 the table notifies its resize policy of this (point B); the
3570 container finally notifies its resize policy that the search
3571 has ended (point C); it then queries its resize policy whether
3572 a resize is needed, and if so, what is the new size (points D
3573 to G); following the resize, it notifies the policy that a
3574 resize has completed (point H); finally, the element is
3575 inserted, and the policy notified (point I).</para>
3576
3577 <figure>
3578 <title>Insert resize sequence diagram</title>
3579 <mediaobject>
3580 <imageobject>
3581 <imagedata align="center" format="PNG" scale="100"
3582 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_insert_resize_sequence_diagram1.png"/>
3583 </imageobject>
3584 <textobject>
3585 <phrase>Insert resize sequence diagram</phrase>
3586 </textobject>
3587 </mediaobject>
3588 </figure>
3589
3590
3591 <para>In practice, a resize policy can be usually orthogonally
3592 decomposed to a size policy and a trigger policy. Consequently,
3593 the library contains a single class for instantiating a resize
3594 policy: <classname>hash_standard_resize_policy</classname>
3595 is parametrized by <classname>Size_Policy</classname> and
3596 <classname>Trigger_Policy</classname>, derives <classname>public</classname>ly from
3597 both, and acts as a standard delegate (<xref linkend="biblio.gof"/>)
3598 to these policies.</para>
3599
3600 <para>The two graphics immediately below show sequence diagrams
3601 illustrating the interaction between the standard resize policy
3602 and its trigger and size policies, respectively.</para>
3603
3604 <figure>
3605 <title>Standard resize policy trigger sequence
3606 diagram</title>
3607 <mediaobject>
3608 <imageobject>
3609 <imagedata align="center" format="PNG" scale="100"
3610 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_insert_resize_sequence_diagram2.png"/>
3611 </imageobject>
3612 <textobject>
3613 <phrase>Standard resize policy trigger sequence
3614 diagram</phrase>
3615 </textobject>
3616 </mediaobject>
3617 </figure>
3618
3619 <figure>
3620 <title>Standard resize policy size sequence
3621 diagram</title>
3622 <mediaobject>
3623 <imageobject>
3624 <imagedata align="center" format="PNG" scale="100"
3625 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_insert_resize_sequence_diagram3.png"/>
3626 </imageobject>
3627 <textobject>
3628 <phrase>Standard resize policy size sequence
3629 diagram</phrase>
3630 </textobject>
3631 </mediaobject>
3632 </figure>
3633
3634
3635 </section>
3636
3637 <section xml:id="resize_policies.impl.predefined">
3638 <info><title>Predefined Policies</title></info>
3639 <para>The library includes the following
3640 instantiations of size and trigger policies:</para>
3641
3642 <orderedlist>
3643 <listitem><para><classname>hash_load_check_resize_trigger</classname>
3644 implements a load check trigger policy.</para></listitem>
3645
3646 <listitem><para><classname>cc_hash_max_collision_check_resize_trigger</classname>
3647 implements a collision check trigger policy.</para></listitem>
3648
3649 <listitem><para><classname>hash_exponential_size_policy</classname>
3650 implements an exponential-size policy (which should be used
3651 with mask range hashing).</para></listitem>
3652
3653 <listitem><para><classname>hash_prime_size_policy</classname>
3654 implementing a size policy based on a sequence of primes
3655 (which should
3656 be used with mod range hashing</para></listitem>
3657 </orderedlist>
3658
3659 <para>The graphic below gives an overall picture of the resize-related
3660 classes. <classname>basic_hash_table</classname>
3661 is parametrized by <classname>Resize_Policy</classname>, which it subclasses
3662 publicly. This class is currently instantiated only by <classname>hash_standard_resize_policy</classname>.
3663 <classname>hash_standard_resize_policy</classname>
3664 itself is parametrized by <classname>Trigger_Policy</classname> and
3665 <classname>Size_Policy</classname>. Currently, <classname>Trigger_Policy</classname> is
3666 instantiated by <classname>hash_load_check_resize_trigger</classname>,
3667 or <classname>cc_hash_max_collision_check_resize_trigger</classname>;
3668 <classname>Size_Policy</classname> is instantiated by <classname>hash_exponential_size_policy</classname>,
3669 or <classname>hash_prime_size_policy</classname>.</para>
3670
3671 </section>
3672
3673 <section xml:id="resize_policies.impl.internals">
3674 <info><title>Controling Access to Internals</title></info>
3675
3676 <para>There are cases where (controlled) access to resize
3677 policies' internals is beneficial. E.g., it is sometimes
3678 useful to query a hash-table for the table's actual size (as
3679 opposed to its <function>size()</function> - the number of values it
3680 currently holds); it is sometimes useful to set a table's
3681 initial size, externally resize it, or change load factors.</para>
3682
3683 <para>Clearly, supporting such methods both decreases the
3684 encapsulation of hash-based containers, and increases the
3685 diversity between different associative-containers' interfaces.
3686 Conversely, omitting such methods can decrease containers'
3687 flexibility.</para>
3688
3689 <para>In order to avoid, to the extent possible, the above
3690 conflict, the hash-based containers themselves do not address
3691 any of these questions; this is deferred to the resize policies,
3692 which are easier to change or replace. Thus, for example,
3693 neither <classname>cc_hash_table</classname> nor
3694 <classname>gp_hash_table</classname>
3695 contain methods for querying the actual size of the table; this
3696 is deferred to <classname>hash_standard_resize_policy</classname>.</para>
3697
3698 <para>Furthermore, the policies themselves are parametrized by
3699 template arguments that determine the methods they support
3700 (
3701 <xref linkend="biblio.alexandrescu01modern"/>
3702 shows techniques for doing so). <classname>hash_standard_resize_policy</classname>
3703 is parametrized by <classname>External_Size_Access</classname> that
3704 determines whether it supports methods for querying the actual
3705 size of the table or resizing it. <classname>hash_load_check_resize_trigger</classname>
3706 is parametrized by <classname>External_Load_Access</classname> that
3707 determines whether it supports methods for querying or
3708 modifying the loads. <classname>cc_hash_max_collision_check_resize_trigger</classname>
3709 is parametrized by <classname>External_Load_Access</classname> that
3710 determines whether it supports methods for querying the
3711 load.</para>
3712
3713 <para>Some operations, for example, resizing a container at
3714 run time, or changing the load factors of a load-check trigger
3715 policy, require the container itself to resize. As mentioned
3716 above, the hash-based containers themselves do not contain
3717 these types of methods, only their resize policies.
3718 Consequently, there must be some mechanism for a resize policy
3719 to manipulate the hash-based container. As the hash-based
3720 container is a subclass of the resize policy, this is done
3721 through virtual methods. Each hash-based container has a
3722 <classname>private</classname> <classname>virtual</classname> method:</para>
3723 <programlisting>
3724 virtual void
3725 do_resize
3726 (size_type new_size);
3727 </programlisting>
3728
3729 <para>which resizes the container. Implementations of
3730 <classname>Resize_Policy</classname> can export public methods for resizing
3731 the container externally; these methods internally call
3732 <classname>do_resize</classname> to resize the table.</para>
3733
3734
3735 </section>
3736
3737 </section>
3738
3739
3740 </section> <!-- resize policies -->
3741
3742 <section xml:id="container.hash.details.policy_interaction">
3743 <info><title>Policy Interactions</title></info>
3744 <para>
3745 </para>
3746 <para>Hash-tables are unfortunately especially susceptible to
3747 choice of policies. One of the more complicated aspects of this
3748 is that poor combinations of good policies can form a poor
3749 container. Following are some considerations.</para>
3750
3751 <section xml:id="policy_interaction.probesizetrigger">
3752 <info><title>probe/size/trigger</title></info>
3753
3754 <para>Some combinations do not work well for probing containers.
3755 For example, combining a quadratic probe policy with an
3756 exponential size policy can yield a poor container: when an
3757 element is inserted, a trigger policy might decide that there
3758 is no need to resize, as the table still contains unused
3759 entries; the probe sequence, however, might never reach any of
3760 the unused entries.</para>
3761
3762 <para>Unfortunately, this library cannot detect such problems at
3763 compilation (they are halting reducible). It therefore defines
3764 an exception class <classname>insert_error</classname> to throw an
3765 exception in this case.</para>
3766
3767 </section>
3768
3769 <section xml:id="policy_interaction.hashtrigger">
3770 <info><title>hash/trigger</title></info>
3771
3772 <para>Some trigger policies are especially susceptible to poor
3773 hash functions. Suppose, as an extreme case, that the hash
3774 function transforms each key to the same hash value. After some
3775 inserts, a collision detecting policy will always indicate that
3776 the container needs to grow.</para>
3777
3778 <para>The library, therefore, by design, limits each operation to
3779 one resize. For each <classname>insert</classname>, for example, it queries
3780 only once whether a resize is needed.</para>
3781
3782 </section>
3783
3784 <section xml:id="policy_interaction.eqstorehash">
3785 <info><title>equivalence functors/storing hash values/hash</title></info>
3786
3787 <para><classname>cc_hash_table</classname> and
3788 <classname>gp_hash_table</classname> are
3789 parametrized by an equivalence functor and by a
3790 <classname>Store_Hash</classname> parameter. If the latter parameter is
3791 <classname>true</classname>, then the container stores with each entry
3792 a hash value, and uses this value in case of collisions to
3793 determine whether to apply a hash value. This can lower the
3794 cost of collision for some types, but increase the cost of
3795 collisions for other types.</para>
3796
3797 <para>If a ranged-hash function or ranged probe function is
3798 directly supplied, however, then it makes no sense to store the
3799 hash value with each entry. This library's container will
3800 fail at compilation, by design, if this is attempted.</para>
3801
3802 </section>
3803
3804 <section xml:id="policy_interaction.sizeloadtrigger">
3805 <info><title>size/load-check trigger</title></info>
3806
3807 <para>Assume a size policy issues an increasing sequence of sizes
3808 a, a q, a q<superscript>1</superscript>, a q<superscript>2</superscript>, ... For
3809 example, an exponential size policy might issue the sequence of
3810 sizes 8, 16, 32, 64, ...</para>
3811
3812 <para>If a load-check trigger policy is used, with loads
3813 α<subscript>min</subscript> and α<subscript>max</subscript>,
3814 respectively, then it is a good idea to have:</para>
3815
3816 <orderedlist>
3817 <listitem><para>α<subscript>max</subscript> ~ 1 / q</para></listitem>
3818
3819 <listitem><para>α<subscript>min</subscript> &lt; 1 / (2 q)</para></listitem>
3820 </orderedlist>
3821
3822 <para>This will ensure that the amortized hash cost of each
3823 modifying operation is at most approximately 3.</para>
3824
3825 <para>α<subscript>min</subscript> ~ α<subscript>max</subscript> is, in
3826 any case, a bad choice, and α<subscript>min</subscript> &gt;
3827 α <subscript>max</subscript> is horrendous.</para>
3828
3829 </section>
3830
3831 </section>
3832
3833 </section> <!-- details -->
3834
3835 </section> <!-- hash -->
3836
3837 <!-- tree -->
3838 <section xml:id="pbds.design.container.tree">
3839 <info><title>tree</title></info>
3840
3841 <section xml:id="container.tree.interface">
3842 <info><title>Interface</title></info>
3843
3844 <para>The tree-based container has the following declaration:</para>
3845 <programlisting>
3846 template&lt;
3847 typename Key,
3848 typename Mapped,
3849 typename Cmp_Fn = std::less&lt;Key&gt;,
3850 typename Tag = rb_tree_tag,
3851 template&lt;
3852 typename Const_Node_Iterator,
3853 typename Node_Iterator,
3854 typename Cmp_Fn_,
3855 typename Allocator_&gt;
3856 class Node_Update = null_node_update,
3857 typename Allocator = std::allocator&lt;char&gt; &gt;
3858 class tree;
3859 </programlisting>
3860
3861 <para>The parameters have the following meaning:</para>
3862
3863 <orderedlist>
3864 <listitem>
3865 <para><classname>Key</classname> is the key type.</para></listitem>
3866
3867 <listitem>
3868 <para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
3869
3870 <listitem>
3871 <para><classname>Cmp_Fn</classname> is a key comparison functor</para></listitem>
3872
3873 <listitem>
3874 <para><classname>Tag</classname> specifies which underlying data structure
3875 to use.</para></listitem>
3876
3877 <listitem>
3878 <para><classname>Node_Update</classname> is a policy for updating node
3879 invariants.</para></listitem>
3880
3881 <listitem>
3882 <para><classname>Allocator</classname> is an allocator
3883 type.</para></listitem>
3884 </orderedlist>
3885
3886 <para>The <classname>Tag</classname> parameter specifies which underlying
3887 data structure to use. Instantiating it by <classname>rb_tree_tag</classname>, <classname>splay_tree_tag</classname>, or
3888 <classname>ov_tree_tag</classname>,
3889 specifies an underlying red-black tree, splay tree, or
3890 ordered-vector tree, respectively; any other tag is illegal.
3891 Note that containers based on the former two contain more types
3892 and methods than the latter (e.g.,
3893 <classname>reverse_iterator</classname> and <classname>rbegin</classname>), and different
3894 exception and invalidation guarantees.</para>
3895
3896 </section>
3897
3898 <section xml:id="container.tree.details">
3899 <info><title>Details</title></info>
3900
3901 <section xml:id="container.tree.node">
3902 <info><title>Node Invariants</title></info>
3903
3904
3905 <para>Consider the two trees in the graphic below, labels A and B. The first
3906 is a tree of floats; the second is a tree of pairs, each
3907 signifying a geometric line interval. Each element in a tree is refered to as a node of the tree. Of course, each of
3908 these trees can support the usual queries: the first can easily
3909 search for <classname>0.4</classname>; the second can easily search for
3910 <classname>std::make_pair(10, 41)</classname>.</para>
3911
3912 <para>Each of these trees can efficiently support other queries.
3913 The first can efficiently determine that the 2rd key in the
3914 tree is <constant>0.3</constant>; the second can efficiently determine
3915 whether any of its intervals overlaps
3916 <programlisting>std::make_pair(29,42)</programlisting> (useful in geometric
3917 applications or distributed file systems with leases, for
3918 example). It should be noted that an <classname>std::set</classname> can
3919 only solve these types of problems with linear complexity.</para>
3920
3921 <para>In order to do so, each tree stores some metadata in
3922 each node, and maintains node invariants (see <xref linkend="biblio.clrs2001"/>.) The first stores in
3923 each node the size of the sub-tree rooted at the node; the
3924 second stores at each node the maximal endpoint of the
3925 intervals at the sub-tree rooted at the node.</para>
3926
3927 <figure>
3928 <title>Tree node invariants</title>
3929 <mediaobject>
3930 <imageobject>
3931 <imagedata align="center" format="PNG" scale="100"
3932 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_tree_node_invariants.png"/>
3933 </imageobject>
3934 <textobject>
3935 <phrase>Tree node invariants</phrase>
3936 </textobject>
3937 </mediaobject>
3938 </figure>
3939
3940 <para>Supporting such trees is difficult for a number of
3941 reasons:</para>
3942
3943 <orderedlist>
3944 <listitem><para>There must be a way to specify what a node's metadata
3945 should be (if any).</para></listitem>
3946
3947 <listitem><para>Various operations can invalidate node
3948 invariants. The graphic below shows how a right rotation,
3949 performed on A, results in B, with nodes x and y having
3950 corrupted invariants (the grayed nodes in C). The graphic shows
3951 how an insert, performed on D, results in E, with nodes x and y
3952 having corrupted invariants (the grayed nodes in F). It is not
3953 feasible to know outside the tree the effect of an operation on
3954 the nodes of the tree.</para></listitem>
3955
3956 <listitem><para>The search paths of standard associative containers are
3957 defined by comparisons between keys, and not through
3958 metadata.</para></listitem>
3959
3960 <listitem><para>It is not feasible to know in advance which methods trees
3961 can support. Besides the usual <classname>find</classname> method, the
3962 first tree can support a <classname>find_by_order</classname> method, while
3963 the second can support an <classname>overlaps</classname> method.</para></listitem>
3964 </orderedlist>
3965
3966 <figure>
3967 <title>Tree node invalidation</title>
3968 <mediaobject>
3969 <imageobject>
3970 <imagedata align="center" format="PNG" scale="100"
3971 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_tree_node_invalidations.png"/>
3972 </imageobject>
3973 <textobject>
3974 <phrase>Tree node invalidation</phrase>
3975 </textobject>
3976 </mediaobject>
3977 </figure>
3978
3979 <para>These problems are solved by a combination of two means:
3980 node iterators, and template-template node updater
3981 parameters.</para>
3982
3983 <section xml:id="container.tree.node.iterators">
3984 <info><title>Node Iterators</title></info>
3985
3986
3987 <para>Each tree-based container defines two additional iterator
3988 types, <classname>const_node_iterator</classname>
3989 and <classname>node_iterator</classname>.
3990 These iterators allow descending from a node to one of its
3991 children. Node iterator allow search paths different than those
3992 determined by the comparison functor. The <classname>tree</classname>
3993 supports the methods:</para>
3994 <programlisting>
3995 const_node_iterator
3996 node_begin() const;
3997
3998 node_iterator
3999 node_begin();
4000
4001 const_node_iterator
4002 node_end() const;
4003
4004 node_iterator
4005 node_end();
4006 </programlisting>
4007
4008 <para>The first pairs return node iterators corresponding to the
4009 root node of the tree; the latter pair returns node iterators
4010 corresponding to a just-after-leaf node.</para>
4011 </section>
4012
4013 <section xml:id="container.tree.node.updator">
4014 <info><title>Node Updator</title></info>
4015
4016 <para>The tree-based containers are parametrized by a
4017 <classname>Node_Update</classname> template-template parameter. A
4018 tree-based container instantiates
4019 <classname>Node_Update</classname> to some
4020 <classname>node_update</classname> class, and publicly subclasses
4021 <classname>node_update</classname>. The graphic below shows this
4022 scheme, as well as some predefined policies (which are explained
4023 below).</para>
4024
4025 <figure>
4026 <title>A tree and its update policy</title>
4027 <mediaobject>
4028 <imageobject>
4029 <imagedata align="center" format="PNG" scale="100"
4030 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_tree_node_updator_policy_cd.png"/>
4031 </imageobject>
4032 <textobject>
4033 <phrase>A tree and its update policy</phrase>
4034 </textobject>
4035 </mediaobject>
4036 </figure>
4037
4038 <para><classname>node_update</classname> (an instantiation of
4039 <classname>Node_Update</classname>) must define <classname>metadata_type</classname> as
4040 the type of metadata it requires. For order statistics,
4041 e.g., <classname>metadata_type</classname> might be <classname>size_t</classname>.
4042 The tree defines within each node a <classname>metadata_type</classname>
4043 object.</para>
4044
4045 <para><classname>node_update</classname> must also define the following method
4046 for restoring node invariants:</para>
4047 <programlisting>
4048 void
4049 operator()(node_iterator nd_it, const_node_iterator end_nd_it)
4050 </programlisting>
4051
4052 <para>In this method, <varname>nd_it</varname> is a
4053 <classname>node_iterator</classname> corresponding to a node whose
4054 A) all descendants have valid invariants, and B) its own
4055 invariants might be violated; <classname>end_nd_it</classname> is
4056 a <classname>const_node_iterator</classname> corresponding to a
4057 just-after-leaf node. This method should correct the node
4058 invariants of the node pointed to by
4059 <classname>nd_it</classname>. For example, say node x in the
4060 graphic below label A has an invalid invariant, but its' children,
4061 y and z have valid invariants. After the invocation, all three
4062 nodes should have valid invariants, as in label B.</para>
4063
4064
4065 <figure>
4066 <title>Restoring node invariants</title>
4067 <mediaobject>
4068 <imageobject>
4069 <imagedata align="center" format="PNG" scale="100"
4070 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_restoring_node_invariants.png"/>
4071 </imageobject>
4072 <textobject>
4073 <phrase>Restoring node invariants</phrase>
4074 </textobject>
4075 </mediaobject>
4076 </figure>
4077
4078 <para>When a tree operation might invalidate some node invariant,
4079 it invokes this method in its <classname>node_update</classname> base to
4080 restore the invariant. For example, the graphic below shows
4081 an <function>insert</function> operation (point A); the tree performs some
4082 operations, and calls the update functor three times (points B,
4083 C, and D). (It is well known that any <function>insert</function>,
4084 <function>erase</function>, <function>split</function> or <function>join</function>, can restore
4085 all node invariants by a small number of node invariant updates (<xref linkend="biblio.clrs2001"/>)
4086 .</para>
4087
4088 <figure>
4089 <title>Insert update sequence</title>
4090 <mediaobject>
4091 <imageobject>
4092 <imagedata align="center" format="PNG" scale="100"
4093 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_update_seq_diagram.png"/>
4094 </imageobject>
4095 <textobject>
4096 <phrase>Insert update sequence</phrase>
4097 </textobject>
4098 </mediaobject>
4099 </figure>
4100
4101 <para>To complete the description of the scheme, three questions
4102 need to be answered:</para>
4103
4104 <orderedlist>
4105 <listitem><para>How can a tree which supports order statistics define a
4106 method such as <classname>find_by_order</classname>?</para></listitem>
4107
4108 <listitem><para>How can the node updater base access methods of the
4109 tree?</para></listitem>
4110
4111 <listitem><para>How can the following cyclic dependency be resolved?
4112 <classname>node_update</classname> is a base class of the tree, yet it
4113 uses node iterators defined in the tree (its child).</para></listitem>
4114 </orderedlist>
4115
4116 <para>The first two questions are answered by the fact that
4117 <classname>node_update</classname> (an instantiation of
4118 <classname>Node_Update</classname>) is a <emphasis>public</emphasis> base class
4119 of the tree. Consequently:</para>
4120
4121 <orderedlist>
4122 <listitem><para>Any public methods of
4123 <classname>node_update</classname> are automatically methods of
4124 the tree (<xref linkend="biblio.alexandrescu01modern"/>).
4125 Thus an order-statistics node updater,
4126 <classname>tree_order_statistics_node_update</classname> defines
4127 the <function>find_by_order</function> method; any tree
4128 instantiated by this policy consequently supports this method as
4129 well.</para></listitem>
4130
4131 <listitem><para>In C++, if a base class declares a method as
4132 <literal>virtual</literal>, it is
4133 <literal>virtual</literal> in its subclasses. If
4134 <classname>node_update</classname> needs to access one of the
4135 tree's methods, say the member function
4136 <function>end</function>, it simply declares that method as
4137 <literal>virtual</literal> abstract.</para></listitem>
4138 </orderedlist>
4139
4140 <para>The cyclic dependency is solved through template-template
4141 parameters. <classname>Node_Update</classname> is parametrized by
4142 the tree's node iterators, its comparison functor, and its
4143 allocator type. Thus, instantiations of
4144 <classname>Node_Update</classname> have all information
4145 required.</para>
4146
4147 <para>This library assumes that constructing a metadata object and
4148 modifying it are exception free. Suppose that during some method,
4149 say <classname>insert</classname>, a metadata-related operation
4150 (e.g., changing the value of a metadata) throws an exception. Ack!
4151 Rolling back the method is unusually complex.</para>
4152
4153 <para>Previously, a distinction was made between redundant
4154 policies and null policies. Node invariants show a
4155 case where null policies are required.</para>
4156
4157 <para>Assume a regular tree is required, one which need not
4158 support order statistics or interval overlap queries.
4159 Seemingly, in this case a redundant policy - a policy which
4160 doesn't affect nodes' contents would suffice. This, would lead
4161 to the following drawbacks:</para>
4162
4163 <orderedlist>
4164 <listitem><para>Each node would carry a useless metadata object, wasting
4165 space.</para></listitem>
4166
4167 <listitem><para>The tree cannot know if its
4168 <classname>Node_Update</classname> policy actually modifies a
4169 node's metadata (this is halting reducible). In the graphic
4170 below, assume the shaded node is inserted. The tree would have
4171 to traverse the useless path shown to the root, applying
4172 redundant updates all the way.</para></listitem>
4173 </orderedlist>
4174 <figure>
4175 <title>Useless update path</title>
4176 <mediaobject>
4177 <imageobject>
4178 <imagedata align="center" format="PNG" scale="100"
4179 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_rationale_null_node_updator.png"/>
4180 </imageobject>
4181 <textobject>
4182 <phrase>Useless update path</phrase>
4183 </textobject>
4184 </mediaobject>
4185 </figure>
4186
4187
4188 <para>A null policy class, <classname>null_node_update</classname>
4189 solves both these problems. The tree detects that node
4190 invariants are irrelevant, and defines all accordingly.</para>
4191
4192 </section>
4193
4194 </section>
4195
4196 <section xml:id="container.tree.details.split">
4197 <info><title>Split and Join</title></info>
4198
4199 <para>Tree-based containers support split and join methods.
4200 It is possible to split a tree so that it passes
4201 all nodes with keys larger than a given key to a different
4202 tree. These methods have the following advantages over the
4203 alternative of externally inserting to the destination
4204 tree and erasing from the source tree:</para>
4205
4206 <orderedlist>
4207 <listitem><para>These methods are efficient - red-black trees are split
4208 and joined in poly-logarithmic complexity; ordered-vector
4209 trees are split and joined at linear complexity. The
4210 alternatives have super-linear complexity.</para></listitem>
4211
4212 <listitem><para>Aside from orders of growth, these operations perform
4213 few allocations and de-allocations. For red-black trees, allocations are not performed,
4214 and the methods are exception-free. </para></listitem>
4215 </orderedlist>
4216 </section>
4217
4218 </section> <!-- details -->
4219
4220 </section> <!-- tree -->
4221
4222 <!-- trie -->
4223 <section xml:id="pbds.design.container.trie">
4224 <info><title>Trie</title></info>
4225
4226 <section xml:id="container.trie.interface">
4227 <info><title>Interface</title></info>
4228
4229 <para>The trie-based container has the following declaration:</para>
4230 <programlisting>
4231 template&lt;typename Key,
4232 typename Mapped,
4233 typename Cmp_Fn = std::less&lt;Key&gt;,
4234 typename Tag = pat_trie_tag,
4235 template&lt;typename Const_Node_Iterator,
4236 typename Node_Iterator,
4237 typename E_Access_Traits_,
4238 typename Allocator_&gt;
4239 class Node_Update = null_node_update,
4240 typename Allocator = std::allocator&lt;char&gt; &gt;
4241 class trie;
4242 </programlisting>
4243
4244 <para>The parameters have the following meaning:</para>
4245
4246 <orderedlist>
4247 <listitem><para><classname>Key</classname> is the key type.</para></listitem>
4248
4249 <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
4250
4251 <listitem><para><classname>E_Access_Traits</classname> is described in below.</para></listitem>
4252
4253 <listitem><para><classname>Tag</classname> specifies which underlying data structure
4254 to use, and is described shortly.</para></listitem>
4255
4256 <listitem><para><classname>Node_Update</classname> is a policy for updating node
4257 invariants. This is described below.</para></listitem>
4258
4259 <listitem><para><classname>Allocator</classname> is an allocator
4260 type.</para></listitem>
4261 </orderedlist>
4262
4263 <para>The <classname>Tag</classname> parameter specifies which underlying
4264 data structure to use. Instantiating it by <classname>pat_trie_tag</classname>, specifies an
4265 underlying PATRICIA trie (explained shortly); any other tag is
4266 currently illegal.</para>
4267
4268 <para>Following is a description of a (PATRICIA) trie
4269 (this implementation follows <xref linkend="biblio.okasaki98mereable"/> and
4270 <xref linkend="biblio.filliatre2000ptset"/>).
4271 </para>
4272
4273 <para>A (PATRICIA) trie is similar to a tree, but with the
4274 following differences:</para>
4275
4276 <orderedlist>
4277 <listitem><para>It explicitly views keys as a sequence of elements.
4278 E.g., a trie can view a string as a sequence of
4279 characters; a trie can view a number as a sequence of
4280 bits.</para></listitem>
4281
4282 <listitem><para>It is not (necessarily) binary. Each node has fan-out n
4283 + 1, where n is the number of distinct
4284 elements.</para></listitem>
4285
4286 <listitem><para>It stores values only at leaf nodes.</para></listitem>
4287
4288 <listitem><para>Internal nodes have the properties that A) each has at
4289 least two children, and B) each shares the same prefix with
4290 any of its descendant.</para></listitem>
4291 </orderedlist>
4292
4293 <para>A (PATRICIA) trie has some useful properties:</para>
4294
4295 <orderedlist>
4296 <listitem><para>It can be configured to use large node fan-out, giving it
4297 very efficient find performance (albeit at insertion
4298 complexity and size).</para></listitem>
4299
4300 <listitem><para>It works well for common-prefix keys.</para></listitem>
4301
4302 <listitem><para>It can support efficiently queries such as which
4303 keys match a certain prefix. This is sometimes useful in file
4304 systems and routers, and for "type-ahead" aka predictive text matching
4305 on mobile devices.</para></listitem>
4306 </orderedlist>
4307
4308
4309 </section>
4310
4311 <section xml:id="container.trie.details">
4312 <info><title>Details</title></info>
4313
4314 <section xml:id="container.trie.details.etraits">
4315 <info><title>Element Access Traits</title></info>
4316
4317 <para>A trie inherently views its keys as sequences of elements.
4318 For example, a trie can view a string as a sequence of
4319 characters. A trie needs to map each of n elements to a
4320 number in {0, n - 1}. For example, a trie can map a
4321 character <varname>c</varname> to
4322 <programlisting>static_cast&lt;size_t&gt;(c)</programlisting>.</para>
4323
4324 <para>Seemingly, then, a trie can assume that its keys support
4325 (const) iterators, and that the <classname>value_type</classname> of this
4326 iterator can be cast to a <classname>size_t</classname>. There are several
4327 reasons, though, to decouple the mechanism by which the trie
4328 accesses its keys' elements from the trie:</para>
4329
4330 <orderedlist>
4331 <listitem><para>In some cases, the numerical value of an element is
4332 inappropriate. Consider a trie storing DNA strings. It is
4333 logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C',
4334 'G', 'T'}|. This requires mapping 'T' to 3, though.</para></listitem>
4335
4336 <listitem><para>In some cases the keys' iterators are different than what
4337 is needed. For example, a trie can be used to search for
4338 common suffixes, by using strings'
4339 <classname>reverse_iterator</classname>. As another example, a trie mapping
4340 UNICODE strings would have a huge fan-out if each node would
4341 branch on a UNICODE character; instead, one can define an
4342 iterator iterating over 8-bit (or less) groups.</para></listitem>
4343 </orderedlist>
4344
4345 <para>trie is,
4346 consequently, parametrized by <classname>E_Access_Traits</classname> -
4347 traits which instruct how to access sequences' elements.
4348 <classname>string_trie_e_access_traits</classname>
4349 is a traits class for strings. Each such traits define some
4350 types, like:</para>
4351 <programlisting>
4352 typename E_Access_Traits::const_iterator
4353 </programlisting>
4354
4355 <para>is a const iterator iterating over a key's elements. The
4356 traits class must also define methods for obtaining an iterator
4357 to the first and last element of a key.</para>
4358
4359 <para>The graphic below shows a
4360 (PATRICIA) trie resulting from inserting the words: "I wish
4361 that I could ever see a poem lovely as a trie" (which,
4362 unfortunately, does not rhyme).</para>
4363
4364 <para>The leaf nodes contain values; each internal node contains
4365 two <classname>typename E_Access_Traits::const_iterator</classname>
4366 objects, indicating the maximal common prefix of all keys in
4367 the sub-tree. For example, the shaded internal node roots a
4368 sub-tree with leafs "a" and "as". The maximal common prefix is
4369 "a". The internal node contains, consequently, to const
4370 iterators, one pointing to <varname>'a'</varname>, and the other to
4371 <varname>'s'</varname>.</para>
4372
4373 <figure>
4374 <title>A PATRICIA trie</title>
4375 <mediaobject>
4376 <imageobject>
4377 <imagedata align="center" format="PNG" scale="100"
4378 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_pat_trie.png"/>
4379 </imageobject>
4380 <textobject>
4381 <phrase>A PATRICIA trie</phrase>
4382 </textobject>
4383 </mediaobject>
4384 </figure>
4385
4386 </section>
4387
4388 <section xml:id="container.trie.details.node">
4389 <info><title>Node Invariants</title></info>
4390
4391 <para>Trie-based containers support node invariants, as do
4392 tree-based containers. There are two minor
4393 differences, though, which, unfortunately, thwart sharing them
4394 sharing the same node-updating policies:</para>
4395
4396 <orderedlist>
4397 <listitem>
4398 <para>A trie's <classname>Node_Update</classname> template-template
4399 parameter is parametrized by <classname>E_Access_Traits</classname>, while
4400 a tree's <classname>Node_Update</classname> template-template parameter is
4401 parametrized by <classname>Cmp_Fn</classname>.</para></listitem>
4402
4403 <listitem><para>Tree-based containers store values in all nodes, while
4404 trie-based containers (at least in this implementation) store
4405 values in leafs.</para></listitem>
4406 </orderedlist>
4407
4408 <para>The graphic below shows the scheme, as well as some predefined
4409 policies (which are explained below).</para>
4410
4411 <figure>
4412 <title>A trie and its update policy</title>
4413 <mediaobject>
4414 <imageobject>
4415 <imagedata align="center" format="PNG" scale="100"
4416 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_trie_node_updator_policy_cd.png"/>
4417 </imageobject>
4418 <textobject>
4419 <phrase>A trie and its update policy</phrase>
4420 </textobject>
4421 </mediaobject>
4422 </figure>
4423
4424
4425 <para>This library offers the following pre-defined trie node
4426 updating policies:</para>
4427
4428 <orderedlist>
4429 <listitem>
4430 <para>
4431 <classname>trie_order_statistics_node_update</classname>
4432 supports order statistics.
4433 </para>
4434 </listitem>
4435
4436 <listitem><para><classname>trie_prefix_search_node_update</classname>
4437 supports searching for ranges that match a given prefix.</para></listitem>
4438
4439 <listitem><para><classname>null_node_update</classname>
4440 is the null node updater.</para></listitem>
4441 </orderedlist>
4442
4443 </section>
4444
4445 <section xml:id="container.trie.details.split">
4446 <info><title>Split and Join</title></info>
4447 <para>Trie-based containers support split and join methods; the
4448 rationale is equal to that of tree-based containers supporting
4449 these methods.</para>
4450 </section>
4451
4452 </section> <!-- details -->
4453
4454 </section> <!-- trie -->
4455
4456 <!-- list_update -->
4457 <section xml:id="pbds.design.container.list">
4458 <info><title>List</title></info>
4459
4460 <section xml:id="container.list.interface">
4461 <info><title>Interface</title></info>
4462
4463 <para>The list-based container has the following declaration:</para>
4464 <programlisting>
4465 template&lt;typename Key,
4466 typename Mapped,
4467 typename Eq_Fn = std::equal_to&lt;Key&gt;,
4468 typename Update_Policy = move_to_front_lu_policy&lt;&gt;,
4469 typename Allocator = std::allocator&lt;char&gt; &gt;
4470 class list_update;
4471 </programlisting>
4472
4473 <para>The parameters have the following meaning:</para>
4474
4475 <orderedlist>
4476 <listitem>
4477 <para>
4478 <classname>Key</classname> is the key type.
4479 </para>
4480 </listitem>
4481
4482 <listitem>
4483 <para>
4484 <classname>Mapped</classname> is the mapped-policy.
4485 </para>
4486 </listitem>
4487
4488 <listitem>
4489 <para>
4490 <classname>Eq_Fn</classname> is a key equivalence functor.
4491 </para>
4492 </listitem>
4493
4494 <listitem>
4495 <para>
4496 <classname>Update_Policy</classname> is a policy updating positions in
4497 the list based on access patterns. It is described in the
4498 following subsection.
4499 </para>
4500 </listitem>
4501
4502 <listitem>
4503 <para>
4504 <classname>Allocator</classname> is an allocator type.
4505 </para>
4506 </listitem>
4507 </orderedlist>
4508
4509 <para>A list-based associative container is a container that
4510 stores elements in a linked-list. It does not order the elements
4511 by any particular order related to the keys. List-based
4512 containers are primarily useful for creating "multimaps". In fact,
4513 list-based containers are designed in this library expressly for
4514 this purpose.</para>
4515
4516 <para>List-based containers might also be useful for some rare
4517 cases, where a key is encapsulated to the extent that only
4518 key-equivalence can be tested. Hash-based containers need to know
4519 how to transform a key into a size type, and tree-based containers
4520 need to know if some key is larger than another. List-based
4521 associative containers, conversely, only need to know if two keys
4522 are equivalent.</para>
4523
4524 <para>Since a list-based associative container does not order
4525 elements by keys, is it possible to order the list in some
4526 useful manner? Remarkably, many on-line competitive
4527 algorithms exist for reordering lists to reflect access
4528 prediction. (See <xref linkend="biblio.motwani95random"/> and <xref linkend="biblio.andrew04mtf"/>).
4529 </para>
4530
4531 </section>
4532
4533 <section xml:id="container.list.details">
4534 <info><title>Details</title></info>
4535 <para>
4536 </para>
4537 <section xml:id="container.list.details.ds">
4538 <info><title>Underlying Data Structure</title></info>
4539
4540 <para>The graphic below shows a
4541 simple list of integer keys. If we search for the integer 6, we
4542 are paying an overhead: the link with key 6 is only the fifth
4543 link; if it were the first link, it could be accessed
4544 faster.</para>
4545
4546 <figure>
4547 <title>A simple list</title>
4548 <mediaobject>
4549 <imageobject>
4550 <imagedata align="center" format="PNG" scale="100"
4551 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_simple_list.png"/>
4552 </imageobject>
4553 <textobject>
4554 <phrase>A simple list</phrase>
4555 </textobject>
4556 </mediaobject>
4557 </figure>
4558
4559 <para>List-update algorithms reorder lists as elements are
4560 accessed. They try to determine, by the access history, which
4561 keys to move to the front of the list. Some of these algorithms
4562 require adding some metadata alongside each entry.</para>
4563
4564 <para>For example, in the graphic below label A shows the counter
4565 algorithm. Each node contains both a key and a count metadata
4566 (shown in bold). When an element is accessed (e.g. 6) its count is
4567 incremented, as shown in label B. If the count reaches some
4568 predetermined value, say 10, as shown in label C, the count is set
4569 to 0 and the node is moved to the front of the list, as in label
4570 D.
4571 </para>
4572
4573 <figure>
4574 <title>The counter algorithm</title>
4575 <mediaobject>
4576 <imageobject>
4577 <imagedata align="center" format="PNG" scale="100"
4578 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_list_update.png"/>
4579 </imageobject>
4580 <textobject>
4581 <phrase>The counter algorithm</phrase>
4582 </textobject>
4583 </mediaobject>
4584 </figure>
4585
4586
4587 </section>
4588
4589 <section xml:id="container.list.details.policies">
4590 <info><title>Policies</title></info>
4591
4592 <para>this library allows instantiating lists with policies
4593 implementing any algorithm moving nodes to the front of the
4594 list (policies implementing algorithms interchanging nodes are
4595 unsupported).</para>
4596
4597 <para>Associative containers based on lists are parametrized by a
4598 <classname>Update_Policy</classname> parameter. This parameter defines the
4599 type of metadata each node contains, how to create the
4600 metadata, and how to decide, using this metadata, whether to
4601 move a node to the front of the list. A list-based associative
4602 container object derives (publicly) from its update policy.
4603 </para>
4604
4605 <para>An instantiation of <classname>Update_Policy</classname> must define
4606 internally <classname>update_metadata</classname> as the metadata it
4607 requires. Internally, each node of the list contains, besides
4608 the usual key and data, an instance of <classname>typename
4609 Update_Policy::update_metadata</classname>.</para>
4610
4611 <para>An instantiation of <classname>Update_Policy</classname> must define
4612 internally two operators:</para>
4613 <programlisting>
4614 update_metadata
4615 operator()();
4616
4617 bool
4618 operator()(update_metadata &amp;);
4619 </programlisting>
4620
4621 <para>The first is called by the container object, when creating a
4622 new node, to create the node's metadata. The second is called
4623 by the container object, when a node is accessed (
4624 when a find operation's key is equivalent to the key of the
4625 node), to determine whether to move the node to the front of
4626 the list.
4627 </para>
4628
4629 <para>The library contains two predefined implementations of
4630 list-update policies. The first
4631 is <classname>lu_counter_policy</classname>, which implements the
4632 counter algorithm described above. The second is
4633 <classname>lu_move_to_front_policy</classname>,
4634 which unconditionally move an accessed element to the front of
4635 the list. The latter type is very useful in this library,
4636 since there is no need to associate metadata with each element.
4637 (See <xref linkend="biblio.andrew04mtf"/>
4638 </para>
4639
4640 </section>
4641
4642 <section xml:id="container.list.details.mapped">
4643 <info><title>Use in Multimaps</title></info>
4644
4645 <para>In this library, there are no equivalents for the standard's
4646 multimaps and multisets; instead one uses an associative
4647 container mapping primary keys to secondary keys.</para>
4648
4649 <para>List-based containers are especially useful as associative
4650 containers for secondary keys. In fact, they are implemented
4651 here expressly for this purpose.</para>
4652
4653 <para>To begin with, these containers use very little per-entry
4654 structure memory overhead, since they can be implemented as
4655 singly-linked lists. (Arrays use even lower per-entry memory
4656 overhead, but they are less flexible in moving around entries,
4657 and have weaker invalidation guarantees).</para>
4658
4659 <para>More importantly, though, list-based containers use very
4660 little per-container memory overhead. The memory overhead of an
4661 empty list-based container is practically that of a pointer.
4662 This is important for when they are used as secondary
4663 associative-containers in situations where the average ratio of
4664 secondary keys to primary keys is low (or even 1).</para>
4665
4666 <para>In order to reduce the per-container memory overhead as much
4667 as possible, they are implemented as closely as possible to
4668 singly-linked lists.</para>
4669
4670 <orderedlist>
4671 <listitem>
4672 <para>
4673 List-based containers do not store internally the number
4674 of values that they hold. This means that their <function>size</function>
4675 method has linear complexity (just like <classname>std::list</classname>).
4676 Note that finding the number of equivalent-key values in a
4677 standard multimap also has linear complexity (because it must be
4678 done, via <function>std::distance</function> of the
4679 multimap's <function>equal_range</function> method), but usually with
4680 higher constants.
4681 </para>
4682 </listitem>
4683
4684 <listitem>
4685 <para>
4686 Most associative-container objects each hold a policy
4687 object (a hash-based container object holds a
4688 hash functor). List-based containers, conversely, only have
4689 class-wide policy objects.
4690 </para>
4691 </listitem>
4692 </orderedlist>
4693
4694
4695 </section>
4696
4697 </section> <!-- details -->
4698
4699 </section> <!-- list -->
4700
4701
4702 <!-- priority_queue -->
4703 <section xml:id="pbds.design.container.priority_queue">
4704 <info><title>Priority Queue</title></info>
4705
4706 <section xml:id="container.priority_queue.interface">
4707 <info><title>Interface</title></info>
4708
4709 <para>The priority queue container has the following
4710 declaration:
4711 </para>
4712 <programlisting>
4713 template&lt;typename Value_Type,
4714 typename Cmp_Fn = std::less&lt;Value_Type&gt;,
4715 typename Tag = pairing_heap_tag,
4716 typename Allocator = std::allocator&lt;char &gt; &gt;
4717 class priority_queue;
4718 </programlisting>
4719
4720 <para>The parameters have the following meaning:</para>
4721
4722 <orderedlist>
4723 <listitem><para><classname>Value_Type</classname> is the value type.</para></listitem>
4724
4725 <listitem><para><classname>Cmp_Fn</classname> is a value comparison functor</para></listitem>
4726
4727 <listitem><para><classname>Tag</classname> specifies which underlying data structure
4728 to use.</para></listitem>
4729
4730 <listitem><para><classname>Allocator</classname> is an allocator
4731 type.</para></listitem>
4732 </orderedlist>
4733
4734 <para>The <classname>Tag</classname> parameter specifies which underlying
4735 data structure to use. Instantiating it by<classname>pairing_heap_tag</classname>,<classname>binary_heap_tag</classname>,
4736 <classname>binomial_heap_tag</classname>,
4737 <classname>rc_binomial_heap_tag</classname>,
4738 or <classname>thin_heap_tag</classname>,
4739 specifies, respectively,
4740 an underlying pairing heap (<xref linkend="biblio.fredman86pairing"/>),
4741 binary heap (<xref linkend="biblio.clrs2001"/>),
4742 binomial heap (<xref linkend="biblio.clrs2001"/>),
4743 a binomial heap with a redundant binary counter (<xref linkend="biblio.maverik_lowerbounds"/>),
4744 or a thin heap (<xref linkend="biblio.kt99fat_heaps"/>).
4745 </para>
4746
4747 <para>
4748 As mentioned in the tutorial,
4749 <classname>__gnu_pbds::priority_queue</classname> shares most of the
4750 same interface with <classname>std::priority_queue</classname>.
4751 E.g. if <varname>q</varname> is a priority queue of type
4752 <classname>Q</classname>, then <function>q.top()</function> will
4753 return the "largest" value in the container (according to
4754 <classname>typename
4755 Q::cmp_fn</classname>). <classname>__gnu_pbds::priority_queue</classname>
4756 has a larger (and very slightly different) interface than
4757 <classname>std::priority_queue</classname>, however, since typically
4758 <classname>push</classname> and <classname>pop</classname> are deemed
4759 insufficient for manipulating priority-queues. </para>
4760
4761 <para>Different settings require different priority-queue
4762 implementations which are described in later; see traits
4763 discusses ways to differentiate between the different traits of
4764 different implementations.</para>
4765
4766
4767 </section>
4768
4769 <section xml:id="container.priority_queue.details">
4770 <info><title>Details</title></info>
4771
4772 <section xml:id="container.priority_queue.details.iterators">
4773 <info><title>Iterators</title></info>
4774
4775 <para>There are many different underlying-data structures for
4776 implementing priority queues. Unfortunately, most such
4777 structures are oriented towards making <function>push</function> and
4778 <function>top</function> efficient, and consequently don't allow efficient
4779 access of other elements: for instance, they cannot support an efficient
4780 <function>find</function> method. In the use case where it
4781 is important to both access and "do something with" an
4782 arbitrary value, one would be out of luck. For example, many graph algorithms require
4783 modifying a value (typically increasing it in the sense of the
4784 priority queue's comparison functor).</para>
4785
4786 <para>In order to access and manipulate an arbitrary value in a
4787 priority queue, one needs to reference the internals of the
4788 priority queue from some form of an associative container -
4789 this is unavoidable. Of course, in order to maintain the
4790 encapsulation of the priority queue, this needs to be done in a
4791 way that minimizes exposure to implementation internals.</para>
4792
4793 <para>In this library the priority queue's <function>insert</function>
4794 method returns an iterator, which if valid can be used for subsequent <function>modify</function> and
4795 <function>erase</function> operations. This both preserves the priority
4796 queue's encapsulation, and allows accessing arbitrary values (since the
4797 returned iterators from the <function>push</function> operation can be
4798 stored in some form of associative container).</para>
4799
4800 <para>Priority queues' iterators present a problem regarding their
4801 invalidation guarantees. One assumes that calling
4802 <function>operator++</function> on an iterator will associate it
4803 with the "next" value. Priority-queues are
4804 self-organizing: each operation changes what the "next" value
4805 means. Consequently, it does not make sense that <function>push</function>
4806 will return an iterator that can be incremented - this can have
4807 no possible use. Also, as in the case of hash-based containers,
4808 it is awkward to define if a subsequent <function>push</function> operation
4809 invalidates a prior returned iterator: it invalidates it in the
4810 sense that its "next" value is not related to what it
4811 previously considered to be its "next" value. However, it might not
4812 invalidate it, in the sense that it can be
4813 de-referenced and used for <function>modify</function> and <function>erase</function>
4814 operations.</para>
4815
4816 <para>Similarly to the case of the other unordered associative
4817 containers, this library uses a distinction between
4818 point-type and range type iterators. A priority queue's <classname>iterator</classname> can always be
4819 converted to a <classname>point_iterator</classname>, and a
4820 <classname>const_iterator</classname> can always be converted to a
4821 <classname>point_const_iterator</classname>.</para>
4822
4823 <para>The following snippet demonstrates manipulating an arbitrary
4824 value:</para>
4825 <programlisting>
4826 // A priority queue of integers.
4827 priority_queue&lt;int &gt; p;
4828
4829 // Insert some values into the priority queue.
4830 priority_queue&lt;int &gt;::point_iterator it = p.push(0);
4831
4832 p.push(1);
4833 p.push(2);
4834
4835 // Now modify a value.
4836 p.modify(it, 3);
4837
4838 assert(p.top() == 3);
4839 </programlisting>
4840
4841
4842 <para>It should be noted that an alternative design could embed an
4843 associative container in a priority queue. Could, but most
4844 probably should not. To begin with, it should be noted that one
4845 could always encapsulate a priority queue and an associative
4846 container mapping values to priority queue iterators with no
4847 performance loss. One cannot, however, "un-encapsulate" a priority
4848 queue embedding an associative container, which might lead to
4849 performance loss. Assume, that one needs to associate each value
4850 with some data unrelated to priority queues. Then using
4851 this library's design, one could use an
4852 associative container mapping each value to a pair consisting of
4853 this data and a priority queue's iterator. Using the embedded
4854 method would need to use two associative containers. Similar
4855 problems might arise in cases where a value can reside
4856 simultaneously in many priority queues.</para>
4857
4858 </section>
4859
4860
4861 <section xml:id="container.priority_queue.details.d">
4862 <info><title>Underlying Data Structure</title></info>
4863
4864 <para>There are three main implementations of priority queues: the
4865 first employs a binary heap, typically one which uses a
4866 sequence; the second uses a tree (or forest of trees), which is
4867 typically less structured than an associative container's tree;
4868 the third simply uses an associative container. These are
4869 shown in the graphic below, in labels A1 and A2, label B, and label C.</para>
4870
4871 <figure>
4872 <title>Underlying Priority-Queue Data-Structures.</title>
4873 <mediaobject>
4874 <imageobject>
4875 <imagedata align="center" format="PNG" scale="100"
4876 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_priority_queue_different_underlying_dss.png"/>
4877 </imageobject>
4878 <textobject>
4879 <phrase>Underlying Priority-Queue Data-Structures.</phrase>
4880 </textobject>
4881 </mediaobject>
4882 </figure>
4883
4884 <para>Roughly speaking, any value that is both pushed and popped
4885 from a priority queue must incur a logarithmic expense (in the
4886 amortized sense). Any priority queue implementation that would
4887 avoid this, would violate known bounds on comparison-based
4888 sorting (see <xref linkend="biblio.clrs2001"/> and <xref linkend="biblio.brodal96priority"/>).
4889 </para>
4890
4891 <para>Most implementations do
4892 not differ in the asymptotic amortized complexity of
4893 <function>push</function> and <function>pop</function> operations, but they differ in
4894 the constants involved, in the complexity of other operations
4895 (e.g., <function>modify</function>), and in the worst-case
4896 complexity of single operations. In general, the more
4897 "structured" an implementation (i.e., the more internal
4898 invariants it possesses) - the higher its amortized complexity
4899 of <function>push</function> and <function>pop</function> operations.</para>
4900
4901 <para>This library implements different algorithms using a
4902 single class: <classname>priority_queue</classname>.
4903 Instantiating the <classname>Tag</classname> template parameter, "selects"
4904 the implementation:</para>
4905
4906 <orderedlist>
4907 <listitem><para>
4908 Instantiating <classname>Tag = binary_heap_tag</classname> creates
4909 a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally
4910 selected by priority_queue
4911 if <classname>Value_Type</classname> is instantiated by a primitive type
4912 (e.g., an <type>int</type>); the latter is
4913 internally selected for all other types (e.g.,
4914 <classname>std::string</classname>). This implementations is relatively
4915 unstructured, and so has good <classname>push</classname> and <classname>pop</classname>
4916 performance; it is the "best-in-kind" for primitive
4917 types, e.g., <type>int</type>s. Conversely, it has
4918 high worst-case performance, and can support only linear-time
4919 <function>modify</function> and <function>erase</function> operations.</para></listitem>
4920
4921 <listitem><para>Instantiating <classname>Tag =
4922 pairing_heap_tag</classname> creates a pairing heap of the form
4923 in represented by label B in the graphic above. This
4924 implementations too is relatively unstructured, and so has good
4925 <function>push</function> and <function>pop</function>
4926 performance; it is the "best-in-kind" for non-primitive types,
4927 e.g., <classname>std:string</classname>s. It also has very good
4928 worst-case <function>push</function> and
4929 <function>join</function> performance (O(1)), but has high
4930 worst-case <function>pop</function>
4931 complexity.</para></listitem>
4932
4933 <listitem><para>Instantiating <classname>Tag =
4934 binomial_heap_tag</classname> creates a binomial heap of the
4935 form repsented by label B in the graphic above. This
4936 implementations is more structured than a pairing heap, and so
4937 has worse <function>push</function> and <function>pop</function>
4938 performance. Conversely, it has sub-linear worst-case bounds for
4939 <function>pop</function>, e.g., and so it might be preferred in
4940 cases where responsiveness is important.</para></listitem>
4941
4942 <listitem><para>Instantiating <classname>Tag =
4943 rc_binomial_heap_tag</classname> creates a binomial heap of the
4944 form represented in label B above, accompanied by a redundant
4945 counter which governs the trees. This implementations is
4946 therefore more structured than a binomial heap, and so has worse
4947 <function>push</function> and <function>pop</function>
4948 performance. Conversely, it guarantees O(1)
4949 <function>push</function> complexity, and so it might be
4950 preferred in cases where the responsiveness of a binomial heap
4951 is insufficient.</para></listitem>
4952
4953 <listitem><para>Instantiating <classname>Tag =
4954 thin_heap_tag</classname> creates a thin heap of the form
4955 represented by the label B in the graphic above. This
4956 implementations too is more structured than a pairing heap, and
4957 so has worse <function>push</function> and
4958 <function>pop</function> performance. Conversely, it has better
4959 worst-case and identical amortized complexities than a Fibonacci
4960 heap, and so might be more appropriate for some graph
4961 algorithms.</para></listitem>
4962 </orderedlist>
4963
4964 <para>Of course, one can use any order-preserving associative
4965 container as a priority queue, as in the graphic above label C, possibly by creating an adapter class
4966 over the associative container (much as
4967 <classname>std::priority_queue</classname> can adapt <classname>std::vector</classname>).
4968 This has the advantage that no cross-referencing is necessary
4969 at all; the priority queue itself is an associative container.
4970 Most associative containers are too structured to compete with
4971 priority queues in terms of <function>push</function> and <function>pop</function>
4972 performance.</para>
4973
4974
4975
4976 </section>
4977
4978 <section xml:id="container.priority_queue.details.traits">
4979 <info><title>Traits</title></info>
4980
4981 <para>It would be nice if all priority queues could
4982 share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining
4983 two binary heaps might throw an exception (not corrupt
4984 any of the heaps on which it operates), but joining two pairing
4985 heaps is exception free.</para>
4986
4987 <para>Tags and traits are very useful for manipulating generic
4988 types. <classname>__gnu_pbds::priority_queue</classname>
4989 publicly defines <classname>container_category</classname> as one of the tags. Given any
4990 container <classname>Cntnr</classname>, the tag of the underlying
4991 data structure can be found via <classname>typename
4992 Cntnr::container_category</classname>; this is one of the possible tags shown in the graphic below.
4993 </para>
4994
4995 <figure>
4996 <title>Priority-Queue Data-Structure Tags.</title>
4997 <mediaobject>
4998 <imageobject>
4999 <imagedata align="center" format="PNG" scale="100"
5000 fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_priority_queue_tag_hierarchy.png"/>
5001 </imageobject>
5002 <textobject>
5003 <phrase>Priority-Queue Data-Structure Tags.</phrase>
5004 </textobject>
5005 </mediaobject>
5006 </figure>
5007
5008
5009 <para>Additionally, a traits mechanism can be used to query a
5010 container type for its attributes. Given any container
5011 <classname>Cntnr</classname>, then <programlisting>__gnu_pbds::container_traits&lt;Cntnr&gt;</programlisting>
5012 is a traits class identifying the properties of the
5013 container.</para>
5014
5015 <para>To find if a container might throw if two of its objects are
5016 joined, one can use
5017 <programlisting>
5018 container_traits&lt;Cntnr&gt;::split_join_can_throw
5019 </programlisting>
5020 </para>
5021
5022 <para>
5023 Different priority-queue implementations have different invalidation guarantees. This is
5024 especially important, since there is no way to access an arbitrary
5025 value of priority queues except for iterators. Similarly to
5026 associative containers, one can use
5027 <programlisting>
5028 container_traits&lt;Cntnr&gt;::invalidation_guarantee
5029 </programlisting>
5030 to get the invalidation guarantee type of a priority queue.</para>
5031
5032 <para>It is easy to understand from the graphic above, what <classname>container_traits&lt;Cntnr&gt;::invalidation_guarantee</classname>
5033 will be for different implementations. All implementations of
5034 type represented by label B have <classname>point_invalidation_guarantee</classname>:
5035 the container can freely internally reorganize the nodes -
5036 range-type iterators are invalidated, but point-type iterators
5037 are always valid. Implementations of type represented by labels A1 and A2 have <classname>basic_invalidation_guarantee</classname>:
5038 the container can freely internally reallocate the array - both
5039 point-type and range-type iterators might be invalidated.</para>
5040
5041 <para>
5042 This has major implications, and constitutes a good reason to avoid
5043 using binary heaps. A binary heap can perform <function>modify</function>
5044 or <function>erase</function> efficiently given a valid point-type
5045 iterator. However, in order to supply it with a valid point-type
5046 iterator, one needs to iterate (linearly) over all
5047 values, then supply the relevant iterator (recall that a
5048 range-type iterator can always be converted to a point-type
5049 iterator). This means that if the number of <function>modify</function> or
5050 <function>erase</function> operations is non-negligible (say
5051 super-logarithmic in the total sequence of operations) - binary
5052 heaps will perform badly.
5053 </para>
5054
5055 </section>
5056
5057 </section> <!-- details -->
5058
5059 </section> <!-- priority_queue -->
5060
5061
5062
5063 </section> <!-- container -->
5064
5065 </section> <!-- design -->
5066
5067
5068
5069 <!-- S04: Test -->
5070 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5071 href="test_policy_data_structures.xml">
5072 </xi:include>
5073
5074 <!-- S05: Reference/Acknowledgments -->
5075 <section xml:id="pbds.ack">
5076 <info><title>Acknowledgments</title></info>
5077 <?dbhtml filename="policy_data_structures_biblio.html"?>
5078
5079 <para>
5080 Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research
5081 Laboratories), and Benjamin Kosnik (Red Hat).
5082 </para>
5083
5084 <para>
5085 This library was partially written at
5086 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.haifa.il.ibm.com/">IBM's Haifa Research Labs</link>.
5087 It is based heavily on policy-based design and uses many useful
5088 techniques from Modern C++ Design: Generic Programming and Design
5089 Patterns Applied by Andrei Alexandrescu.
5090 </para>
5091
5092 <para>
5093 Two ideas are borrowed from the SGI-STL implementation:
5094 </para>
5095
5096 <orderedlist>
5097 <listitem>
5098 <para>
5099 The prime-based resize policies use a list of primes taken from
5100 the SGI-STL implementation.
5101 </para>
5102 </listitem>
5103
5104 <listitem>
5105 <para>
5106 The red-black trees contain both a root node and a header node
5107 (containing metadata), connected in a way that forward and
5108 reverse iteration can be performed efficiently.
5109 </para>
5110 </listitem>
5111 </orderedlist>
5112
5113 <para>
5114 Some test utilities borrow ideas from
5115 <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.boost.org/doc/libs/release/libs/timer/index.html">boost::timer</link>.
5116 </para>
5117
5118 <para>
5119 We would like to thank Scott Meyers for useful comments (without
5120 attributing to him any flaws in the design or implementation of the
5121 library).
5122 </para>
5123 <para>We would like to thank Matt Austern for the suggestion to
5124 include tries.</para>
5125 </section>
5126
5127 <!-- S06: Biblio -->
5128 <bibliography xml:id="pbds.biblio">
5129 <info>
5130 <title>
5131 Bibliography
5132 </title>
5133 </info>
5134 <?dbhtml filename="policy_data_structures_biblio.html"?>
5135
5136 <!-- 01 -->
5137 <biblioentry xml:id="biblio.abrahams97exception">
5138 <title>
5139 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5140 xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1997/N1075.pdf">
5141 STL Exception Handling Contract
5142 </link>
5143 </title>
5144 <date>1997</date>
5145
5146 <author>
5147 <personname>
5148 <firstname>
5149 Dave
5150 </firstname>
5151 <surname>
5152 Abrahams
5153 </surname>
5154 </personname>
5155 </author>
5156
5157 <publisher>
5158 <publishername>
5159 ISO SC22/WG21
5160 </publishername>
5161 </publisher>
5162 </biblioentry>
5163
5164
5165 <!-- 02 -->
5166 <biblioentry xml:id="biblio.alexandrescu01modern">
5167 <title>
5168 Modern C++ Design: Generic Programming and Design Patterns Applied
5169 </title>
5170 <date>
5171 2001
5172 </date>
5173
5174 <author>
5175 <personname>
5176 <firstname>
5177 Andrei
5178 </firstname>
5179 <surname>
5180 Alexandrescu
5181 </surname>
5182 </personname>
5183 </author>
5184
5185 <publisher>
5186 <publishername>
5187 Addison-Wesley Publishing Company
5188 </publishername>
5189 </publisher>
5190 </biblioentry>
5191
5192
5193 <!-- 03 -->
5194 <biblioentry xml:id="biblio.andrew04mtf">
5195 <title>
5196 MTF, Bit, and COMB: A Guide to Deterministic and Randomized
5197 Algorithms for the List Update Problem
5198 </title>
5199
5200 <authorgroup>
5201 <author>
5202 <personname>
5203 <firstname>
5204 K.
5205 </firstname>
5206 <surname>
5207 Andrew
5208 </surname>
5209 </personname>
5210 </author>
5211
5212 <author>
5213 <personname>
5214 <firstname>
5215 D.
5216 </firstname>
5217 <surname>
5218 Gleich
5219 </surname>
5220 </personname>
5221 </author>
5222 </authorgroup>
5223 </biblioentry>
5224
5225 <!-- 04 -->
5226 <biblioentry xml:id="biblio.austern00noset">
5227 <title>
5228 Why You Shouldn't Use set - and What You Should Use Instead
5229 </title>
5230 <date>
5231 April, 2000
5232 </date>
5233
5234 <author>
5235 <personname>
5236 <firstname>
5237 Matthew
5238 </firstname>
5239 <surname>
5240 Austern
5241 </surname>
5242 </personname>
5243 </author>
5244
5245 <publisher>
5246 <publishername>
5247 C++ Report
5248 </publishername>
5249 </publisher>
5250 </biblioentry>
5251
5252 <!-- 05 -->
5253 <biblioentry xml:id="biblio.austern01htprop">
5254 <title>
5255 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5256 xlink:href="http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2001/n1326.html">
5257 A Proposal to Add Hashtables to the Standard Library
5258 </link>
5259 </title>
5260 <date>
5261 2001
5262 </date>
5263
5264 <author>
5265 <personname>
5266 <firstname>
5267 Matthew
5268 </firstname>
5269 <surname>
5270 Austern
5271 </surname>
5272 </personname>
5273 </author>
5274
5275 <publisher>
5276 <publishername>
5277 ISO SC22/WG21
5278 </publishername>
5279 </publisher>
5280 </biblioentry>
5281
5282 <!-- 06 -->
5283 <biblioentry xml:id="biblio.austern98segmentedit">
5284 <title>
5285 Segmented iterators and hierarchical algorithms
5286 </title>
5287 <date>
5288 April, 1998
5289 </date>
5290
5291 <author>
5292 <personname>
5293 <firstname>
5294 Matthew
5295 </firstname>
5296 <surname>
5297 Austern
5298 </surname>
5299 </personname>
5300 </author>
5301
5302 <publisher>
5303 <publishername>
5304 Generic Programming
5305 </publishername>
5306 </publisher>
5307 </biblioentry>
5308
5309 <!-- 07 -->
5310 <biblioentry xml:id="biblio.dawestimer">
5311 <title>
5312 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5313 xlink:href="www.boost.org/doc/libs/release/libs/timer/">
5314 Boost Timer Library
5315 </link>
5316 </title>
5317
5318 <author>
5319 <personname>
5320 <firstname>
5321 Beeman
5322 </firstname>
5323 <surname>
5324 Dawes
5325 </surname>
5326 </personname>
5327 </author>
5328
5329 <publisher>
5330 <publishername>
5331 Boost
5332 </publishername>
5333 </publisher>
5334 </biblioentry>
5335
5336 <!-- 08 -->
5337 <biblioentry xml:id="biblio.clearypool">
5338 <title>
5339 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5340 xlink:href="www.boost.org/doc/libs/release/libs/pool/">
5341 Boost Pool Library
5342 </link>
5343 </title>
5344
5345 <author>
5346 <personname>
5347 <firstname>
5348 Stephen
5349 </firstname>
5350 <surname>
5351 Cleary
5352 </surname>
5353 </personname>
5354 </author>
5355
5356 <publisher>
5357 <publishername>
5358 Boost
5359 </publishername>
5360 </publisher>
5361 </biblioentry>
5362
5363
5364 <!-- 09 -->
5365 <biblioentry xml:id="biblio.maddocktraits">
5366 <title>
5367 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5368 xlink:href="www.boost.org/doc/libs/release/libs/type_traits/">
5369 Boost Type Traits Library
5370 </link>
5371 </title>
5372 <authorgroup>
5373 <author>
5374 <personname>
5375 <firstname>
5376 Maddock
5377 </firstname>
5378 <surname>
5379 John
5380 </surname>
5381 </personname>
5382 </author>
5383 <author>
5384 <personname>
5385 <firstname>
5386 Stephen
5387 </firstname>
5388 <surname>
5389 Cleary
5390 </surname>
5391 </personname>
5392 </author>
5393 </authorgroup>
5394 <publisher>
5395 <publishername>
5396 Boost
5397 </publishername>
5398 </publisher>
5399 </biblioentry>
5400
5401 <!-- 10 -->
5402 <biblioentry xml:id="biblio.brodal96priority">
5403 <title>
5404 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5405 xlink:href="http://portal.acm.org/citation.cfm?id=313883">
5406 Worst-case efficient priority queues
5407 </link>
5408 </title>
5409
5410 <author>
5411 <personname>
5412 <firstname>
5413 Gerth
5414 </firstname>
5415 <surname>
5416 Stolting Brodal
5417 </surname>
5418 </personname>
5419 </author>
5420
5421 </biblioentry>
5422
5423 <!-- 11 -->
5424 <biblioentry xml:id="biblio.bulkamayheweff">
5425 <title>
5426 Efficient C++ Programming Techniques
5427 </title>
5428 <date>
5429 1997
5430 </date>
5431
5432 <authorgroup>
5433 <author>
5434 <personname>
5435 <firstname>
5436 D.
5437 </firstname>
5438 <surname>
5439 Bulka
5440 </surname>
5441 </personname>
5442 </author>
5443 <author>
5444 <personname>
5445 <firstname>
5446 D.
5447 </firstname>
5448 <surname>
5449 Mayhew
5450 </surname>
5451 </personname>
5452 </author>
5453 </authorgroup>
5454
5455 <publisher>
5456 <publishername>
5457 Addison-Wesley Publishing Company
5458 </publishername>
5459 </publisher>
5460 </biblioentry>
5461
5462 <!-- 12 -->
5463 <biblioentry xml:id="biblio.clrs2001">
5464 <title>
5465 Introduction to Algorithms, 2nd edition
5466 </title>
5467 <date>
5468 2001
5469 </date>
5470 <authorgroup>
5471 <author>
5472 <personname>
5473 <firstname>
5474 T. H.
5475 </firstname>
5476 <surname>
5477 Cormen
5478 </surname>
5479 </personname>
5480 </author>
5481
5482 <author>
5483 <personname>
5484 <firstname>
5485 C. E.
5486 </firstname>
5487 <surname>
5488 Leiserson
5489 </surname>
5490 </personname>
5491 </author>
5492
5493 <author>
5494 <personname>
5495 <firstname>
5496 R. L.
5497 </firstname>
5498 <surname>
5499 Rivest
5500 </surname>
5501 </personname>
5502 </author>
5503
5504 <author>
5505 <personname>
5506 <firstname>
5507 C.
5508 </firstname>
5509 <surname>
5510 Stein
5511 </surname>
5512 </personname>
5513 </author>
5514 </authorgroup>
5515 <publisher>
5516 <publishername>
5517 MIT Press
5518 </publishername>
5519 </publisher>
5520 </biblioentry>
5521
5522 <!-- 13 -->
5523 <biblioentry xml:id="biblio.dubhashi98neg">
5524 <title>
5525 Balls and bins: A study in negative dependence
5526 </title>
5527 <date>
5528 1998
5529 </date>
5530 <authorgroup>
5531 <author>
5532 <personname>
5533 <firstname>
5534 D.
5535 </firstname>
5536 <surname>
5537 Dubashi
5538 </surname>
5539 </personname>
5540 </author>
5541 <author>
5542 <personname>
5543 <firstname>
5544 D.
5545 </firstname>
5546 <surname>
5547 Ranjan
5548 </surname>
5549 </personname>
5550 </author>
5551 </authorgroup>
5552
5553 <publisher>
5554 <publishername>
5555 Random Structures and Algorithms 13
5556 </publishername>
5557 </publisher>
5558 </biblioentry>
5559
5560
5561 <!-- 14 -->
5562 <biblioentry xml:id="biblio.fagin79extendible">
5563 <title>
5564 Extendible hashing - a fast access method for dynamic files
5565 </title>
5566 <date>
5567 1979
5568 </date>
5569
5570 <authorgroup>
5571 <author>
5572 <personname>
5573 <firstname>
5574 R.
5575 </firstname>
5576 <surname>
5577 Fagin
5578 </surname>
5579 </personname>
5580 </author>
5581 <author>
5582 <personname>
5583 <firstname>
5584 J.
5585 </firstname>
5586 <surname>
5587 Nievergelt
5588 </surname>
5589 </personname>
5590 </author>
5591 <author>
5592 <personname>
5593 <firstname>
5594 N.
5595 </firstname>
5596 <surname>
5597 Pippenger
5598 </surname>
5599 </personname>
5600 </author>
5601 <author>
5602 <personname>
5603 <firstname>
5604 H. R.
5605 </firstname>
5606 <surname>
5607 Strong
5608 </surname>
5609 </personname>
5610 </author>
5611 </authorgroup>
5612
5613 <publisher>
5614 <publishername>
5615 ACM Trans. Database Syst. 4
5616 </publishername>
5617 </publisher>
5618 </biblioentry>
5619
5620
5621
5622 <!-- 15 -->
5623 <biblioentry xml:id="biblio.filliatre2000ptset">
5624 <title>
5625 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5626 xlink:href="http://cristal.inria.fr/~frisch/icfp06_contest/advtr/applyOmatic/ptset.ml">
5627 Ptset: Sets of integers implemented as Patricia trees
5628 </link>
5629 </title>
5630
5631 <date>
5632 2000
5633 </date>
5634
5635 <author>
5636 <personname>
5637 <firstname>
5638 Jean-Christophe
5639 </firstname>
5640 <surname>
5641 Filliatre
5642 </surname>
5643 </personname>
5644 </author>
5645 </biblioentry>
5646
5647
5648
5649 <!-- 16 -->
5650 <biblioentry xml:id="biblio.fredman86pairing">
5651 <title>
5652 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5653 xlink:href="http://www.cs.cmu.edu/~sleator/papers/pairing-heaps.pdf">
5654 The pairing heap: a new form of self-adjusting heap
5655 </link>
5656 </title>
5657 <date>
5658 1986
5659 </date>
5660 <authorgroup>
5661 <author>
5662 <personname>
5663 <firstname>
5664 M. L.
5665 </firstname>
5666 <surname>
5667 Fredman
5668 </surname>
5669 </personname>
5670 </author>
5671 <author>
5672 <personname>
5673 <firstname>
5674 R.
5675 </firstname>
5676 <surname>
5677 Sedgewick
5678 </surname>
5679 </personname>
5680 </author>
5681 <author>
5682 <personname>
5683 <firstname>
5684 D. D.
5685 </firstname>
5686 <surname>
5687 Sleator
5688 </surname>
5689 </personname>
5690 </author>
5691 <author>
5692 <personname>
5693 <firstname>
5694 R. E.
5695 </firstname>
5696 <surname>
5697 Tarjan
5698 </surname>
5699 </personname>
5700 </author>
5701 </authorgroup>
5702 </biblioentry>
5703
5704
5705 <!-- 17 -->
5706 <biblioentry xml:id="biblio.gof">
5707 <title>
5708 Design Patterns - Elements of Reusable Object-Oriented Software
5709 </title>
5710 <date>
5711 1995
5712 </date>
5713 <authorgroup>
5714 <author>
5715 <personname>
5716 <firstname>
5717 E.
5718 </firstname>
5719 <surname>
5720 Gamma
5721 </surname>
5722 </personname>
5723 </author>
5724 <author>
5725 <personname>
5726 <firstname>
5727 R.
5728 </firstname>
5729 <surname>
5730 Helm
5731 </surname>
5732 </personname>
5733 </author>
5734 <author>
5735 <personname>
5736 <firstname>
5737 R.
5738 </firstname>
5739 <surname>
5740 Johnson
5741 </surname>
5742 </personname>
5743 </author>
5744 <author>
5745 <personname>
5746 <firstname>
5747 J.
5748 </firstname>
5749 <surname>
5750 Vlissides
5751 </surname>
5752 </personname>
5753 </author>
5754 </authorgroup>
5755 <publisher>
5756 <publishername>
5757 Addison-Wesley Publishing Company
5758 </publishername>
5759 </publisher>
5760 </biblioentry>
5761
5762
5763 <!-- 18 -->
5764 <biblioentry xml:id="biblio.garg86order">
5765 <title>
5766 Order-preserving key transformations
5767 </title>
5768 <date>
5769 1986
5770 </date>
5771 <authorgroup>
5772 <author>
5773 <personname>
5774 <firstname>
5775 A. K.
5776 </firstname>
5777 <surname>
5778 Garg
5779 </surname>
5780 </personname>
5781 </author>
5782 <author>
5783 <personname>
5784 <firstname>
5785 C. C.
5786 </firstname>
5787 <surname>
5788 Gotlieb
5789 </surname>
5790 </personname>
5791 </author>
5792 </authorgroup>
5793
5794 <publisher>
5795 <publishername>
5796 Trans. Database Syst. 11
5797 </publishername>
5798 </publisher>
5799 </biblioentry>
5800
5801 <!-- 19 -->
5802 <biblioentry xml:id="biblio.hyslop02making">
5803 <title>
5804 Making a real hash of things
5805 </title>
5806 <date>
5807 May 2002
5808 </date>
5809 <authorgroup>
5810 <author>
5811 <personname>
5812 <firstname>
5813 J.
5814 </firstname>
5815 <surname>
5816 Hyslop
5817 </surname>
5818 </personname>
5819 </author>
5820 <author>
5821 <personname>
5822 <firstname>
5823 Herb
5824 </firstname>
5825 <surname>
5826 Sutter
5827 </surname>
5828 </personname>
5829 </author>
5830 </authorgroup>
5831
5832 <publisher>
5833 <publishername>
5834 C++ Report
5835 </publishername>
5836 </publisher>
5837 </biblioentry>
5838
5839
5840 <!-- 20 -->
5841 <biblioentry xml:id="biblio.jossutis01stl">
5842 <title>
5843 The C++ Standard Library - A Tutorial and Reference
5844 </title>
5845 <date>
5846 2001
5847 </date>
5848
5849 <author>
5850 <personname>
5851 <firstname>
5852 N. M.
5853 </firstname>
5854 <surname>
5855 Jossutis
5856 </surname>
5857 </personname>
5858 </author>
5859 <publisher>
5860 <publishername>
5861 Addison-Wesley Publishing Company
5862 </publishername>
5863 </publisher>
5864 </biblioentry>
5865
5866 <!-- 21 -->
5867 <biblioentry xml:id="biblio.kt99fat_heaps">
5868 <title>
5869 <link xmlns:xlink="http://www.w3.org/1999/xlink"
5870 xlink:href="http://www.cs.princeton.edu/research/techreps/TR-597-99">
5871 New Heap Data Structures
5872 </link>
5873 </title>
5874 <date>
5875 1999
5876 </date>
5877
5878 <authorgroup>
5879 <author>
5880 <personname>
5881 <firstname>
5882 Haim
5883 </firstname>
5884 <surname>
5885 Kaplan
5886 </surname>
5887 </personname>
5888 </author>
5889 <author>
5890 <personname>
5891 <firstname>
5892 Robert E.
5893 </firstname>
5894 <surname>
5895 Tarjan
5896 </surname>
5897 </personname>
5898 </author>
5899 </authorgroup>
5900 </biblioentry>
5901
5902
5903 <!-- 22 -->
5904 <biblioentry xml:id="biblio.kleft00sets">
5905 <title>
5906 Are Set Iterators Mutable or Immutable?
5907 </title>
5908 <date>
5909 October 2000
5910 </date>
5911 <authorgroup>
5912 <author>
5913 <personname>
5914 <firstname>
5915 Angelika
5916 </firstname>
5917 <surname>
5918 Langer
5919 </surname>
5920 </personname>
5921 </author>
5922
5923 <author>
5924 <personname>
5925 <firstname>
5926 Klaus
5927 </firstname>
5928 <surname>
5929 Kleft
5930 </surname>
5931 </personname>
5932 </author>
5933 </authorgroup>
5934
5935 <publisher>
5936 <publishername>
5937 C/C++ Users Jornal
5938 </publishername>
5939 </publisher>
5940 </biblioentry>
5941
5942 <!-- 23 -->
5943 <biblioentry xml:id="biblio.knuth98sorting">
5944 <title>
5945 The Art of Computer Programming - Sorting and Searching
5946 </title>
5947 <date>
5948 1998
5949 </date>
5950
5951 <author>
5952 <personname>
5953 <firstname>
5954 D. E.
5955 </firstname>
5956 <surname>
5957 Knuth
5958 </surname>
5959 </personname>
5960 </author>
5961
5962 <publisher>
5963 <publishername>
5964 Addison-Wesley Publishing Company
5965 </publishername>
5966 </publisher>
5967 </biblioentry>
5968
5969 <!-- 24 -->
5970 <biblioentry xml:id="biblio.liskov98data">
5971 <title>
5972 Data abstraction and hierarchy
5973 </title>
5974 <date>
5975 May 1998
5976 </date>
5977
5978 <author>
5979 <personname>
5980 <firstname>
5981 B.
5982 </firstname>
5983 <surname>
5984 Liskov
5985 </surname>
5986 </personname>
5987 </author>
5988
5989 <publisher>
5990 <publishername>
5991 SIGPLAN Notices 23
5992 </publishername>
5993 </publisher>
5994 </biblioentry>
5995
5996 <!-- 25 -->
5997 <biblioentry xml:id="biblio.litwin80lh">
5998 <title>
5999 Linear hashing: A new tool for file and table addressing
6000 </title>
6001 <date>
6002 June 1980
6003 </date>
6004
6005 <author>
6006 <personname>
6007 <firstname>
6008 W.
6009 </firstname>
6010 <surname>
6011 Litwin
6012 </surname>
6013 </personname>
6014 </author>
6015
6016 <publisher>
6017 <publishername>
6018 Proceedings of International Conference on Very Large Data Bases
6019 </publishername>
6020 </publisher>
6021 </biblioentry>
6022
6023 <!-- 26 -->
6024 <biblioentry xml:id="biblio.maverik_lowerbounds">
6025 <title>
6026 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6027 xlink:href="http://magic.aladdin.cs.cmu.edu/2005/08/01/deamortization-part-2-binomial-heaps">
6028 Deamortization - Part 2: Binomial Heaps
6029 </link>
6030 </title>
6031 <date>
6032 2005
6033 </date>
6034
6035 <author>
6036 <personname>
6037 <firstname>
6038 Maverik
6039 </firstname>
6040 <surname>
6041 Woo
6042 </surname>
6043 </personname>
6044 </author>
6045 </biblioentry>
6046
6047 <!-- 27 -->
6048 <biblioentry xml:id="biblio.meyers96more">
6049 <title>
6050 More Effective C++: 35 New Ways to Improve Your Programs and Designs
6051 </title>
6052 <date>
6053 1996
6054 </date>
6055
6056 <author>
6057 <personname>
6058 <firstname>
6059 Scott
6060 </firstname>
6061 <surname>
6062 Meyers
6063 </surname>
6064 </personname>
6065 </author>
6066
6067 <publisher>
6068 <publishername>
6069 Addison-Wesley Publishing Company
6070 </publishername>
6071 </publisher>
6072 </biblioentry>
6073
6074 <!-- 28 -->
6075 <biblioentry xml:id="biblio.meyers00nonmember">
6076 <title>
6077 How Non-Member Functions Improve Encapsulation
6078 </title>
6079 <date>
6080 2000
6081 </date>
6082
6083 <author>
6084 <personname>
6085 <firstname>
6086 Scott
6087 </firstname>
6088 <surname>
6089 Meyers
6090 </surname>
6091 </personname>
6092 </author>
6093
6094 <publisher>
6095 <publishername>
6096 C/C++ Users Journal
6097 </publishername>
6098 </publisher>
6099 </biblioentry>
6100
6101 <!-- 29 -->
6102 <biblioentry xml:id="biblio.meyers01stl">
6103 <title>
6104 Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library
6105 </title>
6106 <date>
6107 2001
6108 </date>
6109
6110 <author>
6111 <personname>
6112 <firstname>
6113 Scott
6114 </firstname>
6115 <surname>
6116 Meyers
6117 </surname>
6118 </personname>
6119 </author>
6120
6121 <publisher>
6122 <publishername>
6123 Addison-Wesley Publishing Company
6124 </publishername>
6125 </publisher>
6126 </biblioentry>
6127
6128 <!-- 30 -->
6129 <biblioentry xml:id="biblio.meyers02both">
6130 <title>
6131 Class Template, Member Template - or Both?
6132 </title>
6133 <date>
6134 2003
6135 </date>
6136
6137 <author>
6138 <personname>
6139 <firstname>
6140 Scott
6141 </firstname>
6142 <surname>
6143 Meyers
6144 </surname>
6145 </personname>
6146 </author>
6147
6148 <publisher>
6149 <publishername>
6150 C/C++ Users Journal
6151 </publishername>
6152 </publisher>
6153 </biblioentry>
6154
6155 <!-- 31 -->
6156 <biblioentry xml:id="biblio.motwani95random">
6157 <title>
6158 Randomized Algorithms
6159 </title>
6160 <date>
6161 2003
6162 </date>
6163 <authorgroup>
6164 <author>
6165 <personname>
6166 <firstname>
6167 R.
6168 </firstname>
6169 <surname>
6170 Motwani
6171 </surname>
6172 </personname>
6173 </author>
6174 <author>
6175 <personname>
6176 <firstname>
6177 P.
6178 </firstname>
6179 <surname>
6180 Raghavan
6181 </surname>
6182 </personname>
6183 </author>
6184 </authorgroup>
6185 <publisher>
6186 <publishername>
6187 Cambridge University Press
6188 </publishername>
6189 </publisher>
6190 </biblioentry>
6191
6192
6193 <!-- 32 -->
6194 <biblioentry xml:id="biblio.mscom">
6195 <title>
6196 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6197 xlink:href="http://www.microsoft.com/com">
6198 COM: Component Model Object Technologies
6199 </link>
6200 </title>
6201 <publisher>
6202 <publishername>
6203 Microsoft
6204 </publishername>
6205 </publisher>
6206 </biblioentry>
6207
6208 <!-- 33 -->
6209 <biblioentry xml:id="biblio.musser95rationale">
6210 <title>
6211 Rationale for Adding Hash Tables to the C++ Standard Template Library
6212 </title>
6213 <date>
6214 1995
6215 </date>
6216
6217 <author>
6218 <personname>
6219 <firstname>
6220 David R.
6221 </firstname>
6222 <surname>
6223 Musser
6224 </surname>
6225 </personname>
6226 </author>
6227
6228 </biblioentry>
6229
6230 <!-- 35 -->
6231 <biblioentry xml:id="biblio.musser96stltutorial">
6232 <title>
6233 STL Tutorial and Reference Guide
6234 </title>
6235 <date>
6236 1996
6237 </date>
6238
6239 <authorgroup>
6240 <author>
6241 <personname>
6242 <firstname>
6243 David R.
6244 </firstname>
6245 <surname>
6246 Musser
6247 </surname>
6248 </personname>
6249 </author>
6250 <author>
6251 <personname>
6252 <firstname>
6253 A.
6254 </firstname>
6255 <surname>
6256 Saini
6257 </surname>
6258 </personname>
6259 </author>
6260 </authorgroup>
6261 <publisher>
6262 <publishername>
6263 Addison-Wesley Publishing Company
6264 </publishername>
6265 </publisher>
6266
6267 </biblioentry>
6268
6269
6270 <!-- 36 -->
6271 <biblioentry xml:id="biblio.nelson96stlpq">
6272 <title>
6273 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6274 xlink:href="http://www.dogma.net/markn/articles/pq_stl/priority.htm">Priority Queues and the STL
6275 </link>
6276 </title>
6277 <date>
6278 January 1996
6279 </date>
6280
6281 <author>
6282 <personname>
6283 <firstname>
6284 Mark
6285 </firstname>
6286 <surname>
6287 Nelson
6288 </surname>
6289 </personname>
6290 </author>
6291
6292 <publisher>
6293 <publishername>
6294 Dr. Dobbs Journal
6295 </publishername>
6296 </publisher>
6297 </biblioentry>
6298
6299
6300 <!-- 37 -->
6301 <biblioentry xml:id="biblio.okasaki98mereable">
6302 <title>
6303 Fast mergeable integer maps
6304 </title>
6305 <date>
6306 September 1998
6307 </date>
6308 <authorgroup>
6309 <author>
6310 <personname>
6311 <firstname>
6312 C.
6313 </firstname>
6314 <surname>
6315 Okasaki
6316 </surname>
6317 </personname>
6318 </author>
6319 <author>
6320 <personname>
6321 <firstname>
6322 A.
6323 </firstname>
6324 <surname>
6325 Gill
6326 </surname>
6327 </personname>
6328 </author>
6329 </authorgroup>
6330 <publisher>
6331 <publishername>
6332 In Workshop on ML
6333 </publishername>
6334 </publisher>
6335 </biblioentry>
6336
6337 <!-- 38 -->
6338 <biblioentry xml:id="biblio.sgi_stl">
6339 <title>
6340 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6341 xlink:href="http://www.sgi.com/tech/stl">
6342 Standard Template Library Programmer's Guide
6343 </link>
6344 </title>
6345 <author>
6346 <personname>
6347 <firstname>
6348 Matt
6349 </firstname>
6350 <surname>
6351 Austern
6352 </surname>
6353 </personname>
6354 </author>
6355
6356 <publisher>
6357 <publishername>
6358 SGI
6359 </publishername>
6360 </publisher>
6361 </biblioentry>
6362
6363 <!-- 39 -->
6364 <biblioentry xml:id="biblio.select_man">
6365 <title>
6366 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6367 xlink:href="http://www.scit.wlv.ac.uk/cgi-bin/mansec?3C+select">
6368 select
6369 </link>
6370 </title>
6371 </biblioentry>
6372
6373
6374 <!-- 40 -->
6375 <biblioentry xml:id="biblio.sleator84amortized">
6376 <title>
6377 Amortized Efficiency of List Update Problems
6378 </title>
6379 <date>
6380 1984
6381 </date>
6382 <authorgroup>
6383 <author>
6384 <personname>
6385 <firstname>
6386 D. D.
6387 </firstname>
6388 <surname>
6389 Sleator
6390 </surname>
6391 </personname>
6392 </author>
6393
6394 <author>
6395 <personname>
6396 <firstname>
6397 R. E.
6398 </firstname>
6399 <surname>
6400 Tarjan
6401 </surname>
6402 </personname>
6403 </author>
6404 </authorgroup>
6405
6406 <publisher>
6407 <publishername>
6408 ACM Symposium on Theory of Computing
6409 </publishername>
6410 </publisher>
6411 </biblioentry>
6412
6413 <!-- 41 -->
6414 <biblioentry xml:id="biblio.sleator85self">
6415 <title>
6416 Self-Adjusting Binary Search Trees
6417 </title>
6418 <date>
6419 1985
6420 </date>
6421
6422 <authorgroup>
6423 <author>
6424 <personname>
6425 <firstname>
6426 D. D.
6427 </firstname>
6428 <surname>
6429 Sleator
6430 </surname>
6431 </personname>
6432 </author>
6433
6434 <author>
6435 <personname>
6436 <firstname>
6437 R. E.
6438 </firstname>
6439 <surname>
6440 Tarjan
6441 </surname>
6442 </personname>
6443 </author>
6444 </authorgroup>
6445
6446 <publisher>
6447 <publishername>
6448 ACM Symposium on Theory of Computing
6449 </publishername>
6450 </publisher>
6451 </biblioentry>
6452
6453 <!-- 42 -->
6454 <biblioentry xml:id="biblio.stepanov94standard">
6455 <title>
6456 The Standard Template Library
6457 </title>
6458 <date>
6459 1984
6460 </date>
6461 <authorgroup>
6462 <author>
6463 <personname>
6464 <firstname>
6465 A. A.
6466 </firstname>
6467 <surname>
6468 Stepanov
6469 </surname>
6470 </personname>
6471 </author>
6472 <author>
6473 <personname>
6474 <firstname>
6475 M.
6476 </firstname>
6477 <surname>
6478 Lee
6479 </surname>
6480 </personname>
6481 </author>
6482 </authorgroup>
6483 </biblioentry>
6484
6485 <!-- 43 -->
6486 <biblioentry xml:id="biblio.stroustrup97cpp">
6487 <title>
6488 The C++ Programming Langugage
6489 </title>
6490 <date>
6491 1997
6492 </date>
6493
6494 <author>
6495 <personname>
6496 <firstname>
6497 Bjarne
6498 </firstname>
6499 <surname>
6500 Stroustrup
6501 </surname>
6502 </personname>
6503 </author>
6504
6505 <publisher>
6506 <publishername>
6507 Addison-Wesley Publishing Company
6508 </publishername>
6509 </publisher>
6510 </biblioentry>
6511
6512 <!-- 44 -->
6513 <biblioentry xml:id="biblio.vandevoorde2002cpptemplates">
6514 <title>
6515 C++ Templates: The Complete Guide
6516 </title>
6517 <date>
6518 2002
6519 </date>
6520 <authorgroup>
6521 <author>
6522 <personname>
6523 <firstname>
6524 D.
6525 </firstname>
6526 <surname>
6527 Vandevoorde
6528 </surname>
6529 </personname>
6530 </author>
6531
6532 <author>
6533 <personname>
6534 <firstname>
6535 N. M.
6536 </firstname>
6537 <surname>
6538 Josuttis
6539 </surname>
6540 </personname>
6541 </author>
6542 </authorgroup>
6543 <publisher>
6544 <publishername>
6545 Addison-Wesley Publishing Company
6546 </publishername>
6547 </publisher>
6548 </biblioentry>
6549
6550
6551 <!-- 45 -->
6552 <biblioentry xml:id="biblio.wickland96thirty">
6553 <title>
6554 <link xmlns:xlink="http://www.w3.org/1999/xlink"
6555 xlink:href="http://myweb.wvnet.edu/~gsa00121/books/amongdead30.zip">
6556 Thirty Years Among the Dead
6557 </link>
6558 </title>
6559 <date>
6560 1996
6561 </date>
6562
6563 <author>
6564 <personname>
6565 <firstname>
6566 C. A.
6567 </firstname>
6568 <surname>
6569 Wickland
6570 </surname>
6571 </personname>
6572 </author>
6573
6574 <publisher>
6575 <publishername>
6576 National Psychological Institute
6577 </publishername>
6578 </publisher>
6579 </biblioentry>
6580
6581
6582 </bibliography>
6583
6584 </chapter>