libstdc++-v3/doc/xml/manual/policy_data_structures.xml

   1 <chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
   2          xml:id="manual.ext.containers.pbds" xreflabel="pbds">
   3   <info>
   4     <title>Policy-Based Data Structures</title>
   5     <keywordset>
   6       <keyword>
   7         ISO C++
   8       </keyword>
   9       <keyword>
  10         policy
  11       </keyword>
  12       <keyword>
  13         container
  14       </keyword>
  15       <keyword>
  16         data
  17       </keyword>
  18       <keyword>
  19         structure
  20       </keyword>
  21       <keyword>
  22         associated
  23       </keyword>
  24       <keyword>
  25         tree
  26       </keyword>
  27       <keyword>
  28         trie
  29       </keyword>
  30       <keyword>
  31         hash
  32       </keyword>
  33       <keyword>
  34         metaprogramming
  35       </keyword>
  36     </keywordset>
  37   </info>
  38   <?dbhtml filename="policy_data_structures.html"?>
  39
  40   <!-- 2006-04-01 Ami Tavory -->
  41   <!-- 2011-05-25 Benjamin Kosnik -->
  42
  43   <!-- S01: intro -->
  44   <section xml:id="pbds.intro">
  45     <info><title>Intro</title></info>
  46
  47     <para>
  48       This is a library of policy-based elementary data structures:
  49       associative containers and priority queues. It is designed for
  50       high-performance, flexibility, semantic safety, and conformance to
  51       the corresponding containers in <literal>std</literal> and
  52       <literal>std::tr1</literal> (except for some points where it differs
  53       by design).
  54     </para>
  55     <para>
  56     </para>
  57
  58     <section xml:id="pbds.intro.issues">
  59       <info><title>Performance Issues</title></info>
  60       <para>
  61       </para>
  62
  63       <para>
  64         An attempt is made to categorize the wide variety of possible
  65         container designs in terms of performance-impacting factors. These
  66         performance factors are translated into design policies and
  67         incorporated into container design.
  68       </para>
  69
  70       <para>
  71         There is tension between unravelling factors into a coherent set of
  72         policies. Every attempt is made to make a minimal set of
  73         factors. However, in many cases multiple factors make for long
  74         template names. Every attempt is made to alias and use typedefs in
  75         the source files, but the generated names for external symbols can
  76         be large for binary files or debuggers.
  77       </para>
  78
  79       <para>
  80         In many cases, the longer names allow capabilities and behaviours
  81         controlled by macros to also be unamibiguously emitted as distinct
  82         generated names.
  83       </para>
  84
  85       <para>
  86         Specific issues found while unraveling performance factors in the
  87         design of associative containers and priority queues follow.
  88       </para>
  89
  90       <section xml:id="pbds.intro.issues.associative">
  91         <info><title>Associative</title></info>
  92
  93         <para>
  94           Associative containers depend on their composite policies to a very
  95           large extent. Implicitly hard-wiring policies can hamper their
  96           performance and limit their functionality. An efficient hash-based
  97           container, for example, requires policies for testing key
  98           equivalence, hashing keys, translating hash values into positions
  99           within the hash table, and determining when and how to resize the
 100           table internally. A tree-based container can efficiently support
 101           order statistics, i.e. the ability to query what is the order of
 102           each key within the sequence of keys in the container, but only if
 103           the container is supplied with a policy to internally update
 104           meta-data. There are many other such examples.
 105         </para>
 106
 107         <para>
 108           Ideally, all associative containers would share the same
 109           interface. Unfortunately, underlying data structures and mapping
 110           semantics differentiate between different containers. For example,
 111           suppose one writes a generic function manipulating an associative
 112           container.
 113         </para>
 114
 115         <programlisting>
 116           template&lt;typename Cntnr&gt;
 117           void
 118           some_op_sequence(Cntnr&amp; r_cnt)
 119           {
 120           ...
 121           }
 122         </programlisting>
 123
 124         <para>
 125           Given this, then what can one assume about the instantiating
 126           container? The answer varies according to its underlying data
 127           structure. If the underlying data structure of
 128           <literal>Cntnr</literal> is based on a tree or trie, then the order
 129           of elements is well defined; otherwise, it is not, in general. If
 130           the underlying data structure of <literal>Cntnr</literal> is based
 131           on a collision-chaining hash table, then modifying
 132           r_<literal>Cntnr</literal> will not invalidate its iterators' order;
 133           if the underlying data structure is a probing hash table, then this
 134           is not the case. If the underlying data structure is based on a tree
 135           or trie, then a reference to the container can efficiently be split;
 136           otherwise, it cannot, in general. If the underlying data structure
 137           is a red-black tree, then splitting a reference to the container is
 138           exception-free; if it is an ordered-vector tree, exceptions can be
 139           thrown.
 140         </para>
 141
 142       </section>
 143
 144       <section xml:id="pbds.intro.issues.priority_queue">
 145         <info><title>Priority Que</title></info>
 146
 147         <para>
 148           Priority queues are useful when one needs to efficiently access a
 149           minimum (or maximum) value as the set of values changes.
 150         </para>
 151
 152         <para>
 153           Most useful data structures for priority queues have a relatively
 154           simple structure, as they are geared toward relatively simple
 155           requirements. Unfortunately, these structures do not support access
 156           to an arbitrary value, which turns out to be necessary in many
 157           algorithms. Say, decreasing an arbitrary value in a graph
 158           algorithm. Therefore, some extra mechanism is necessary and must be
 159           invented for accessing arbitrary values. There are at least two
 160           alternatives: embedding an associative container in a priority
 161           queue, or allowing cross-referencing through iterators. The first
 162           solution adds significant overhead; the second solution requires a
 163           precise definition of iterator invalidation. Which is the next
 164           point...
 165         </para>
 166
 167         <para>
 168           Priority queues, like hash-based containers, store values in an
 169           order that is meaningless and undefined externally. For example, a
 170           <code>push</code> operation can internally reorganize the
 171           values. Because of this characteristic, describing a priority
 172           queues' iterator is difficult: on one hand, the values to which
 173           iterators point can remain valid, but on the other, the logical
 174           order of iterators can change unpredictably.
 175         </para>
 176
 177         <para>
 178           Roughly speaking, any element that is both inserted to a priority
 179           queue (e.g. through <code>push</code>) and removed
 180           from it (e.g., through <code>pop</code>), incurs a
 181           logarithmic overhead (in the amortized sense). Different underlying
 182           data structures place the actual cost differently: some are
 183           optimized for amortized complexity, whereas others guarantee that
 184           specific operations only have a constant cost. One underlying data
 185           structure might be chosen if modifying a value is frequent
 186           (Dijkstra's shortest-path algorithm), whereas a different one might
 187           be chosen otherwise. Unfortunately, an array-based binary heap - an
 188           underlying data structure that optimizes (in the amortized sense)
 189           <code>push</code> and <code>pop</code> operations, differs from the
 190           others in terms of its invalidation guarantees. Other design
 191           decisions also impact the cost and placement of the overhead, at the
 192           expense of more difference in the the kinds of operations that the
 193           underlying data structure can support. These differences pose a
 194           challenge when creating a uniform interface for priority queues.
 195         </para>
 196       </section>
 197     </section>
 198
 199     <section xml:id="pbds.intro.motivation">
 200       <info><title>Goals</title></info>
 201
 202       <para>
 203         Many fine associative-container libraries were already written,
 204         most notably, the C++ standard's associative containers. Why
 205         then write another library? This section shows some possible
 206         advantages of this library, when considering the challenges in
 207         the introduction. Many of these points stem from the fact that
 208         the ISO C++ process introduced associative-containers in a
 209         two-step process (first standardizing tree-based containers,
 210         only then adding hash-based containers, which are fundamentally
 211         different), did not standardize priority queues as containers,
 212         and (in our opinion) overloads the iterator concept.
 213       </para>
 214
 215       <section xml:id="pbds.intro.motivation.associative">
 216         <info><title>Associative</title></info>
 217         <para>
 218         </para>
 219
 220         <section xml:id="motivation.associative.policy">
 221           <info><title>Policy Choices</title></info>
 222           <para>
 223             Associative containers require a relatively large number of
 224             policies to function efficiently in various settings. In some
 225             cases this is needed for making their common operations more
 226             efficient, and in other cases this allows them to support a
 227             larger set of operations
 228           </para>
 229
 230           <orderedlist>
 231             <listitem>
 232               <para>
 233                 Hash-based containers, for example, support look-up and
 234                 insertion methods (<function>find</function> and
 235                 <function>insert</function>). In order to locate elements
 236                 quickly, they are supplied a hash functor, which instruct
 237                 how to transform a key object into some size type; a hash
 238                 functor might transform <constant>"hello"</constant>
 239                 into <constant>1123002298</constant>. A hash table, though,
 240                 requires transforming each key object into some size-type
 241                 type in some specific domain; a hash table with a 128-long
 242                 table might transform <constant>"hello"</constant> into
 243                 position <constant>63</constant>. The policy by which the
 244                 hash value is transformed into a position within the table
 245                 can dramatically affect performance.  Hash-based containers
 246                 also do not resize naturally (as opposed to tree-based
 247                 containers, for example). The appropriate resize policy is
 248                 unfortunately intertwined with the policy that transforms
 249                 hash value into a position within the table.
 250               </para>
 251             </listitem>
 252
 253             <listitem>
 254               <para>
 255                 Tree-based containers, for example, also support look-up and
 256                 insertion methods, and are primarily useful when maintaining
 257                 order between elements is important. In some cases, though,
 258                 one can utilize their balancing algorithms for completely
 259                 different purposes.
 260               </para>
 261
 262               <para>
 263                 Figure A shows a tree whose each node contains two entries:
 264                 a floating-point key, and some size-type
 265                 <emphasis>metadata</emphasis> (in bold beneath it) that is
 266                 the number of nodes in the sub-tree. (The root has key 0.99,
 267                 and has 5 nodes (including itself) in its sub-tree.) A
 268                 container based on this data structure can obviously answer
 269                 efficiently whether 0.3 is in the container object, but it
 270                 can also answer what is the order of 0.3 among all those in
 271                 the container object: see <xref linkend="biblio.clrs2001"/>.
 272
 273               </para>
 274
 275               <para>
 276                 As another example, Figure B shows a tree whose each node
 277                 contains two entries: a half-open geometric line interval,
 278                 and a number <emphasis>metadata</emphasis> (in bold beneath
 279                 it) that is the largest endpoint of all intervals in its
 280                 sub-tree.  (The root describes the interval <constant>[20,
 281                 36)</constant>, and the largest endpoint in its sub-tree is
 282                 99.) A container based on this data structure can obviously
 283                 answer efficiently whether <constant>[3, 41)</constant> is
 284                 in the container object, but it can also answer efficiently
 285                 whether the container object has intervals that intersect
 286                 <constant>[3, 41)</constant>. These types of queries are
 287                 very useful in geometric algorithms and lease-management
 288                 algorithms.
 289               </para>
 290
 291               <para>
 292                 It is important to note, however, that as the trees are
 293                 modified, their internal structure changes. To maintain
 294                 these invariants, one must supply some policy that is aware
 295                 of these changes.  Without this, it would be better to use a
 296                 linked list (in itself very efficient for these purposes).
 297               </para>
 298
 299             </listitem>
 300           </orderedlist>
 301
 302           <figure>
 303             <title>Node Invariants</title>
 304             <mediaobject>
 305               <imageobject>
 306                 <imagedata align="center" format="PNG" scale="100"
 307                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_node_invariants.png"/>
 308               </imageobject>
 309               <textobject>
 310                 <phrase>Node Invariants</phrase>
 311               </textobject>
 312             </mediaobject>
 313           </figure>
 314
 315         </section>
 316
 317         <section xml:id="motivation.associative.underlying">
 318           <info><title>Underlying Data Structures</title></info>
 319           <para>
 320             The standard C++ library contains associative containers based on
 321             red-black trees and collision-chaining hash tables. These are
 322             very useful, but they are not ideal for all types of
 323             settings.
 324           </para>
 325
 326           <para>
 327             The figure below shows the different underlying data structures
 328             currently supported in this library.
 329           </para>
 330
 331           <figure>
 332             <title>Underlying Associative Data Structures</title>
 333             <mediaobject>
 334               <imageobject>
 335                 <imagedata align="center" format="PNG" scale="100"
 336                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_different_underlying_dss_1.png"/>
 337               </imageobject>
 338               <textobject>
 339                 <phrase>Underlying Associative Data Structures</phrase>
 340               </textobject>
 341             </mediaobject>
 342           </figure>
 343
 344           <para>
 345             A shows a collision-chaining hash-table, B shows a probing
 346             hash-table, C shows a red-black tree, D shows a splay tree, E shows
 347             a tree based on an ordered vector(implicit in the order of the
 348             elements), F shows a PATRICIA trie, and G shows a list-based
 349             container with update policies.
 350           </para>
 351
 352           <para>
 353             Each of these data structures has some performance benefits, in
 354             terms of speed, size or both. For now, note that vector-based trees
 355             and probing hash tables manipulate memory more efficiently than
 356             red-black trees and collision-chaining hash tables, and that
 357             list-based associative containers are very useful for constructing
 358             "multimaps".
 359           </para>
 360
 361           <para>
 362             Now consider a function manipulating a generic associative
 363             container,
 364           </para>
 365           <programlisting>
 366             template&lt;class Cntnr&gt;
 367             int
 368             some_op_sequence(Cntnr &amp;r_cnt)
 369             {
 370             ...
 371             }
 372           </programlisting>
 373
 374           <para>
 375             Ideally, the underlying data structure
 376             of <classname>Cntnr</classname> would not affect what can be
 377             done with <varname>r_cnt</varname>.  Unfortunately, this is not
 378             the case.
 379           </para>
 380
 381           <para>
 382             For example, if <classname>Cntnr</classname>
 383             is <classname>std::map</classname>, then the function can
 384             use
 385           </para>
 386           <programlisting>
 387             std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)
 388           </programlisting>
 389           <para>
 390             in order to apply <classname>foobar</classname> to all
 391             elements between <classname>foo</classname> and
 392             <classname>bar</classname>. If
 393             <classname>Cntnr</classname> is a hash-based container,
 394             then this call's results are undefined.
 395           </para>
 396
 397           <para>
 398             Also, if <classname>Cntnr</classname> is tree-based, the type
 399             and object of the comparison functor can be
 400             accessed. If <classname>Cntnr</classname> is hash based, these
 401             queries are nonsensical.
 402           </para>
 403
 404           <para>
 405             There are various other differences based on the container's
 406             underlying data structure. For one, they can be constructed by,
 407             and queried for, different policies. Furthermore:
 408           </para>
 409
 410           <orderedlist>
 411             <listitem>
 412               <para>
 413                 Containers based on C, D, E and F store elements in a
 414                 meaningful order; the others store elements in a meaningless
 415                 (and probably time-varying) order. By implication, only
 416                 containers based on C, D, E and F can
 417                 support <function>erase</function> operations taking an
 418                 iterator and returning an iterator to the following element
 419                 without performance loss.
 420               </para>
 421             </listitem>
 422
 423             <listitem>
 424               <para>
 425                 Containers based on C, D, E, and F can be split and joined
 426                 efficiently, while the others cannot. Containers based on C
 427                 and D, furthermore, can guarantee that this is exception-free;
 428                 containers based on E cannot guarantee this.
 429               </para>
 430             </listitem>
 431
 432             <listitem>
 433               <para>
 434                 Containers based on all but E can guarantee that
 435                 erasing an element is exception free; containers based on E
 436                 cannot guarantee this. Containers based on all but B and E
 437                 can guarantee that modifying an object of their type does
 438                 not invalidate iterators or references to their elements,
 439                 while containers based on B and E cannot. Containers based
 440                 on C, D, and E can furthermore make a stronger guarantee,
 441                 namely that modifying an object of their type does not
 442                 affect the order of iterators.
 443               </para>
 444             </listitem>
 445           </orderedlist>
 446
 447           <para>
 448             A unified tag and traits system (as used for the C++ standard
 449             library iterators, for example) can ease generic manipulation of
 450             associative containers based on different underlying data
 451             structures.
 452           </para>
 453
 454         </section>
 455
 456         <section xml:id="motivation.associative.iterators">
 457           <info><title>Iterators</title></info>
 458           <para>
 459             Iterators are centric to the design of the standard library
 460             containers, because of the container/algorithm/iterator
 461             decomposition that allows an algorithm to operate on a range
 462             through iterators of some sequence.  Iterators, then, are useful
 463             because they allow going over a
 464             specific <emphasis>sequence</emphasis>.  The standard library
 465             also uses iterators for accessing a
 466             specific <emphasis>element</emphasis>: when an associative
 467             container returns one through <function>find</function>. The
 468             standard library consistently uses the same types of iterators
 469             for both purposes: going over a range, and accessing a specific
 470             found element. Before the introduction of hash-based containers
 471             to the standard library, this made sense (with the exception of
 472             priority queues, which are discussed later).
 473           </para>
 474
 475           <para>
 476             Using the standard associative containers together with
 477             non-order-preserving associative containers (and also because of
 478             priority-queues container), there is a possible need for
 479             different types of iterators for self-organizing containers:
 480             the iterator concept seems overloaded to mean two different
 481             things (in some cases). <remark> XXX
 482             "ds_gen.html#find_range">Design::Associative
 483             Containers::Data-Structure Genericity::Point-Type and Range-Type
 484             Methods</remark>.
 485           </para>
 486
 487           <section xml:id="associative.iterators.using">
 488             <info>
 489               <title>Using Point Iterators for Range Operations</title>
 490             </info>
 491             <para>
 492               Suppose <classname>cntnr</classname> is some associative
 493               container, and say <varname>c</varname> is an object of
 494               type <classname>cntnr</classname>. Then what will be the outcome
 495               of
 496             </para>
 497
 498             <programlisting>
 499               std::for_each(c.find(1), c.find(5), foo);
 500             </programlisting>
 501
 502             <para>
 503               If <classname>cntnr</classname> is a tree-based container
 504               object, then an in-order walk will
 505               apply <classname>foo</classname> to the relevant elements,
 506               as in the graphic below, label A. If <varname>c</varname> is
 507               a hash-based container, then the order of elements between any
 508               two elements is undefined (and probably time-varying); there is
 509               no guarantee that the elements traversed will coincide with the
 510               <emphasis>logical</emphasis> elements between 1 and 5, as in
 511               label B.
 512             </para>
 513
 514             <figure>
 515               <title>Range Iteration in Different Data Structures</title>
 516               <mediaobject>
 517                 <imageobject>
 518                   <imagedata align="center" format="PNG" scale="100"
 519                              fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_point_iterators_range_ops_1.png"/>
 520                 </imageobject>
 521                 <textobject>
 522                   <phrase>Node Invariants</phrase>
 523                 </textobject>
 524               </mediaobject>
 525             </figure>
 526
 527             <para>
 528               In our opinion, this problem is not caused just because
 529               red-black trees are order preserving while
 530               collision-chaining hash tables are (generally) not - it
 531               is more fundamental. Most of the standard's containers
 532               order sequences in a well-defined manner that is
 533               determined by their <emphasis>interface</emphasis>:
 534               calling <function>insert</function> on a tree-based
 535               container modifies its sequence in a predictable way, as
 536               does calling <function>push_back</function> on a list or
 537               a vector. Conversely, collision-chaining hash tables,
 538               probing hash tables, priority queues, and list-based
 539               containers (which are very useful for "multimaps") are
 540               self-organizing data structures; the effect of each
 541               operation modifies their sequences in a manner that is
 542               (practically) determined by their
 543               <emphasis>implementation</emphasis>.
 544             </para>
 545
 546             <para>
 547               Consequently, applying an algorithm to a sequence obtained from most
 548               containers may or may not make sense, but applying it to a
 549               sub-sequence of a self-organizing container does not.
 550             </para>
 551           </section>
 552
 553           <section xml:id="associative.iterators.cost">
 554             <info>
 555               <title>Cost to Point Iterators to Enable Range Operations</title>
 556             </info>
 557             <para>
 558               Suppose <varname>c</varname> is some collision-chaining
 559               hash-based container object, and one calls
 560             </para>
 561             <programlisting>c.find(3)</programlisting>
 562             <para>
 563               Then what composes the returned iterator?
 564             </para>
 565
 566             <para>
 567               In the graphic below, label A shows the simplest (and
 568               most efficient) implementation of a collision-chaining
 569               hash table.  The little box marked
 570               <classname>point_iterator</classname> shows an object
 571               that contains a pointer to the element's node. Note that
 572               this "iterator" has no way to move to the next element (
 573               it cannot support
 574               <function>operator++</function>). Conversely, the little
 575               box marked <classname>iterator</classname> stores both a
 576               pointer to the element, as well as some other
 577               information (the bucket number of the element). the
 578               second iterator, then, is "heavier" than the first one-
 579               it requires more time and space. If we were to use a
 580               different container to cross-reference into this
 581               hash-table using these iterators - it would take much
 582               more space. As noted above, nothing much can be done by
 583               incrementing these iterators, so why is this extra
 584               information needed?
 585             </para>
 586
 587             <para>
 588               Alternatively, one might create a collision-chaining hash-table
 589               where the lists might be linked, forming a monolithic total-element
 590               list, as in the graphic below, label B.  Here the iterators are as
 591               light as can be, but the hash-table's operations are more
 592               complicated.
 593             </para>
 594
 595             <figure>
 596               <title>Point Iteration in Hash Data Structures</title>
 597               <mediaobject>
 598                 <imageobject>
 599                   <imagedata align="center" format="PNG" scale="100"
 600                              fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_point_iterators_range_ops_2.png"/>
 601                 </imageobject>
 602                 <textobject>
 603                   <phrase>Point Iteration in Hash Data Structures</phrase>
 604                 </textobject>
 605               </mediaobject>
 606             </figure>
 607
 608             <para>
 609               It should be noted that containers based on collision-chaining
 610               hash-tables are not the only ones with this type of behavior;
 611               many other self-organizing data structures display it as well.
 612             </para>
 613           </section>
 614
 615           <section xml:id="associative.iterators.invalidation">
 616             <info><title>Invalidation Guarantees</title></info>
 617             <para>Consider the following snippet:</para>
 618             <programlisting>
 619               it = c.find(3);
 620               c.erase(5);
 621             </programlisting>
 622
 623             <para>
 624               Following the call to <classname>erase</classname>, what is the
 625               validity of <classname>it</classname>: can it be de-referenced?
 626               can it be incremented?
 627             </para>
 628
 629             <para>
 630               The answer depends on the underlying data structure of the
 631               container. The graphic below shows three cases: A1 and A2 show
 632               a red-black tree; B1 and B2 show a probing hash-table; C1 and C2
 633               show a collision-chaining hash table.
 634             </para>
 635
 636             <figure>
 637               <title>Effect of erase in different underlying data structures</title>
 638               <mediaobject>
 639                 <imageobject>
 640                   <imagedata align="center" format="PNG" scale="100"
 641                              fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_invalidation_guarantee_erase.png"/>
 642                 </imageobject>
 643                 <textobject>
 644                   <phrase>Effect of erase in different underlying data structures</phrase>
 645                 </textobject>
 646               </mediaobject>
 647             </figure>
 648
 649             <orderedlist>
 650               <listitem>
 651                 <para>
 652                   Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can
 653                   be de-referenced and incremented. The sequence of iterators
 654                   changed, but in a way that is well-defined by the interface.
 655                 </para>
 656               </listitem>
 657
 658               <listitem>
 659                 <para>
 660                   Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
 661                   not valid at all - it cannot be de-referenced or
 662                   incremented; the order of iterators changed in a way that is
 663                   (practically) determined by the implementation and not by
 664                   the interface.
 665                 </para>
 666               </listitem>
 667
 668               <listitem>
 669                 <para>
 670                   Erasing 5 from C1 yields C2. Here the situation is more
 671                   complicated. On the one hand, there is no problem in
 672                   de-referencing <classname>it</classname>. On the other hand,
 673                   the order of iterators changed in a way that is
 674                   (practically) determined by the implementation and not by
 675                   the interface.
 676                 </para>
 677               </listitem>
 678             </orderedlist>
 679
 680             <para>
 681               So in the standard library containers, it is not always possible
 682               to express whether <varname>it</varname> is valid or not. This
 683               is true also for <function>insert</function>. Again, the
 684               iterator concept seems overloaded.
 685             </para>
 686           </section>
 687         </section> <!--iterators-->
 688
 689
 690         <section xml:id="motivation.associative.functions">
 691           <info><title>Functional</title></info>
 692           <para>
 693           </para>
 694
 695           <para>
 696             The design of the functional overlay to the underlying data
 697             structures differs slightly from some of the conventions used in
 698             the C++ standard.  A strict public interface of methods that
 699             comprise only operations which depend on the class's internal
 700             structure; other operations are best designed as external
 701             functions. (See <xref linkend="biblio.meyers02both"/>).With this
 702             rubric, the standard associative containers lack some useful
 703             methods, and provide other methods which would be better
 704             removed.
 705           </para>
 706
 707           <section xml:id="motivation.associative.functions.erase">
 708             <info><title><function>erase</function></title></info>
 709
 710             <orderedlist>
 711               <listitem>
 712                 <para>
 713                   Order-preserving standard associative containers provide the
 714                   method
 715                 </para>
 716                 <programlisting>
 717                   iterator
 718                   erase(iterator it)
 719                 </programlisting>
 720
 721                 <para>
 722                   which takes an iterator, erases the corresponding
 723                   element, and returns an iterator to the following
 724                   element. Also standardd hash-based associative
 725                   containers provide this method. This seemingly
 726                   increasesgenericity between associative containers,
 727                   since it is possible to use
 728                 </para>
 729                 <programlisting>
 730                   typename C::iterator it = c.begin();
 731                   typename C::iterator e_it = c.end();
 732
 733                   while(it != e_it)
 734                   it = pred(*it)? c.erase(it) : ++it;
 735                 </programlisting>
 736
 737                 <para>
 738                   in order to erase from a container object <varname>
 739                   c</varname> all element which match a
 740                   predicate <classname>pred</classname>. However, in a
 741                   different sense this actually decreases genericity: an
 742                   integral implication of this method is that tree-based
 743                   associative containers' memory use is linear in the total
 744                   number of elements they store, while hash-based
 745                   containers' memory use is unbounded in the total number of
 746                   elements they store. Assume a hash-based container is
 747                   allowed to decrease its size when an element is
 748                   erased. Then the elements might be rehashed, which means
 749                   that there is no "next" element - it is simply
 750                   undefined. Consequently, it is possible to infer from the
 751                   fact that the standard library's hash-based containers
 752                   provide this method that they cannot downsize when
 753                   elements are erased. As a consequence, different code is
 754                   needed to manipulate different containers, assuming that
 755                   memory should be conserved. Therefor, this library's
 756                   non-order preserving associative containers omit this
 757                   method.
 758                 </para>
 759               </listitem>
 760
 761               <listitem>
 762                 <para>
 763                   All associative containers include a conditional-erase method
 764                 </para>
 765                 <programlisting>
 766                   template&lt;
 767                   class Pred&gt;
 768                   size_type
 769                   erase_if
 770                   (Pred pred)
 771                 </programlisting>
 772                 <para>
 773                   which erases all elements matching a predicate. This is probably the
 774                   only way to ensure linear-time multiple-item erase which can
 775                   actually downsize a container.
 776                 </para>
 777               </listitem>
 778
 779               <listitem>
 780                 <para>
 781                   The standard associative containers provide methods for
 782                   multiple-item erase of the form
 783                 </para>
 784                 <programlisting>
 785                   size_type
 786                   erase(It b, It e)
 787                 </programlisting>
 788                 <para>
 789                   erasing a range of elements given by a pair of
 790                   iterators. For tree-based or trie-based containers, this can
 791                   implemented more efficiently as a (small) sequence of split
 792                   and join operations. For other, unordered, containers, this
 793                   method isn't much better than an external loop. Moreover,
 794                   if <varname>c</varname> is a hash-based container,
 795                   then
 796                 </para>
 797                 <programlisting>
 798                   c.erase(c.find(2), c.find(5))
 799                 </programlisting>
 800                 <para>
 801                   is almost certain to do something
 802                   different than erasing all elements whose keys are between 2
 803                   and 5, and is likely to produce other undefined behavior.
 804                 </para>
 805               </listitem>
 806             </orderedlist>
 807           </section> <!-- erase -->
 808
 809           <section xml:id="motivation.associative.functions.split">
 810             <info>
 811               <title>
 812                 <function>split</function> and <function>join</function>
 813               </title>
 814             </info>
 815             <para>
 816               It is well-known that tree-based and trie-based container
 817               objects can be efficiently split or joined (See
 818               <xref linkend="biblio.clrs2001"/>). Externally splitting or
 819               joining trees is super-linear, and, furthermore, can throw
 820               exceptions. Split and join methods, consequently, seem good
 821               choices for tree-based container methods, especially, since as
 822               noted just before, they are efficient replacements for erasing
 823               sub-sequences.
 824             </para>
 825
 826           </section> <!-- split -->
 827
 828           <section xml:id="motivation.associative.functions.insert">
 829             <info>
 830               <title>
 831                 <function>insert</function>
 832               </title>
 833             </info>
 834             <para>
 835               The standard associative containers provide methods of the form
 836             </para>
 837             <programlisting>
 838               template&lt;class It&gt;
 839               size_type
 840               insert(It b, It e);
 841             </programlisting>
 842
 843             <para>
 844               for inserting a range of elements given by a pair of
 845               iterators. At best, this can be implemented as an external loop,
 846               or, even more efficiently, as a join operation (for the case of
 847               tree-based or trie-based containers). Moreover, these methods seem
 848               similar to constructors taking a range given by a pair of
 849               iterators; the constructors, however, are transactional, whereas
 850               the insert methods are not; this is possibly confusing.
 851             </para>
 852
 853           </section> <!-- insert -->
 854
 855           <section xml:id="motivation.associative.functions.compare">
 856             <info>
 857               <title>
 858                 <function>operator==</function> and <function>operator&lt;=</function>
 859               </title>
 860             </info>
 861
 862             <para>
 863               Associative containers are parametrized by policies allowing to
 864               test key equivalence: a hash-based container can do this through
 865               its equivalence functor, and a tree-based container can do this
 866               through its comparison functor. In addition, some standard
 867               associative containers have global function operators, like
 868               <function>operator==</function> and <function>operator&lt;=</function>,
 869               that allow comparing entire associative containers.
 870             </para>
 871
 872             <para>
 873               In our opinion, these functions are better left out. To begin
 874               with, they do not significantly improve over an external
 875               loop. More importantly, however, they are possibly misleading -
 876               <function>operator==</function>, for example, usually checks for
 877               equivalence, or interchangeability, but the associative
 878               container cannot check for values' equivalence, only keys'
 879               equivalence; also, are two containers considered equivalent if
 880               they store the same values in different order? this is an
 881               arbitrary decision.
 882             </para>
 883           </section> <!-- compare -->
 884
 885         </section>  <!-- functional -->
 886
 887       </section> <!--associative-->
 888
 889       <section xml:id="pbds.intro.motivation.priority_queue">
 890         <info><title>Priority Queues</title></info>
 891
 892         <section xml:id="motivation.priority_queue.policy">
 893           <info><title>Policy Choices</title></info>
 894
 895           <para>
 896             Priority queues are containers that allow efficiently inserting
 897             values and accessing the maximal value (in the sense of the
 898             container's comparison functor). Their interface
 899             supports <function>push</function>
 900             and <function>pop</function>. The standard
 901             container <classname>std::priorityqueue</classname> indeed support
 902             these methods, but little else. For algorithmic and
 903             software-engineering purposes, other methods are needed:
 904           </para>
 905
 906           <orderedlist>
 907             <listitem>
 908               <para>
 909                 Many graph algorithms (see
 910                 <xref linkend="biblio.clrs2001"/>) require increasing a
 911                 value in a priority queue (again, in the sense of the
 912                 container's comparison functor), or joining two
 913                 priority-queue objects.
 914               </para>
 915             </listitem>
 916
 917             <listitem>
 918               <para>The return type of <classname>priority_queue</classname>'s
 919               <function>push</function> method is a point-type iterator, which can
 920               be used for modifying or erasing arbitrary values. For
 921               example:</para>
 922               <programlisting>
 923                 priority_queue&lt;int&gt; p;
 924                 priority_queue&lt;int&gt;::point_iterator it = p.push(3);
 925                 p.modify(it, 4);
 926               </programlisting>
 927
 928               <para>These types of cross-referencing operations are necessary
 929               for making priority queues useful for different applications,
 930               especially graph applications.</para>
 931
 932             </listitem>
 933             <listitem>
 934               <para>
 935                 It is sometimes necessary to erase an arbitrary value in a
 936                 priority queue. For example, consider
 937                 the <function>select</function> function for monitoring
 938                 file descriptors:
 939               </para>
 940
 941               <programlisting>
 942                 int
 943                 select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds,
 944                 struct timeval *timeout);
 945               </programlisting>
 946               <para>
 947                 then, as the select documentation states:
 948               </para>
 949               <para>
 950                 <quote>
 951                   The nfds argument specifies the range of file
 952                   descriptors to be tested. The select() function tests file
 953                 descriptors in the range of 0 to nfds-1.</quote>
 954               </para>
 955
 956               <para>
 957                 It stands to reason, therefore, that we might wish to
 958                 maintain a minimal value for <varname>nfds</varname>, and
 959                 priority queues immediately come to mind. Note, though, that
 960                 when a socket is closed, the minimal file description might
 961                 change; in the absence of an efficient means to erase an
 962                 arbitrary value from a priority queue, we might as well
 963                 avoid its use altogether.
 964               </para>
 965
 966               <para>
 967                 The standard containers typically support iterators. It is
 968                 somewhat unusual
 969                 for <classname>std::priority_queue</classname> to omit them
 970                 (See <xref linkend="biblio.meyers01stl"/>). One might
 971                 ask why do priority queues need to support iterators, since
 972                 they are self-organizing containers with a different purpose
 973                 than abstracting sequences. There are several reasons:
 974               </para>
 975               <orderedlist>
 976                 <listitem>
 977                   <para>
 978                     Iterators (even in self-organizing containers) are
 979                     useful for many purposes: cross-referencing
 980                     containers, serialization, and debugging code that uses
 981                     these containers.
 982                   </para>
 983                 </listitem>
 984
 985                 <listitem>
 986                   <para>
 987                     The standard library's hash-based containers support
 988                     iterators, even though they too are self-organizing
 989                     containers with a different purpose than abstracting
 990                     sequences.
 991                   </para>
 992                 </listitem>
 993
 994                 <listitem>
 995                   <para>
 996                     In standard-library-like containers, it is natural to specify the
 997                     interface of operations for modifying a value or erasing
 998                     a value (discussed previously) in terms of a iterators.
 999                     It should be noted that the standard
1000                     containers also use iterators for accessing and
1001                     manipulating a specific value. In hash-based
1002                     containers, one checks the existence of a key by
1003                     comparing the iterator returned by <function>find</function> to the
1004                     iterator returned by <function>end</function>, and not by comparing a
1005                     pointer returned by <function>find</function> to <type>NULL</type>.
1006                   </para>
1007                 </listitem>
1008               </orderedlist>
1009             </listitem>
1010           </orderedlist>
1011
1012         </section>
1013
1014         <section xml:id="motivation.priority_queue.underlying">
1015           <info><title>Underlying Data Structures</title></info>
1016
1017           <para>
1018             There are three main implementations of priority queues: the
1019             first employs a binary heap, typically one which uses a
1020             sequence; the second uses a tree (or forest of trees), which is
1021             typically less structured than an associative container's tree;
1022             the third simply uses an associative container. These are
1023             shown in the figure below with labels A1 and A2, B, and C.
1024           </para>
1025
1026           <figure>
1027             <title>Underlying Priority Queue Data Structures</title>
1028             <mediaobject>
1029               <imageobject>
1030                 <imagedata align="center" format="PNG" scale="100"
1031                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_different_underlying_dss_2.png"/>
1032               </imageobject>
1033               <textobject>
1034                 <phrase>Underlying Priority Queue Data Structures</phrase>
1035               </textobject>
1036             </mediaobject>
1037           </figure>
1038
1039           <para>
1040             No single implementation can completely replace any of the
1041             others. Some have better <function>push</function>
1042             and <function>pop</function> amortized performance, some have
1043             better bounded (worst case) response time than others, some
1044             optimize a single method at the expense of others, etc. In
1045             general the "best" implementation is dictated by the specific
1046             problem.
1047           </para>
1048
1049           <para>
1050             As with associative containers, the more implementations
1051             co-exist, the more necessary a traits mechanism is for handling
1052             generic containers safely and efficiently. This is especially
1053             important for priority queues, since the invalidation guarantees
1054             of one of the most useful data structures - binary heaps - is
1055             markedly different than those of most of the others.
1056           </para>
1057
1058         </section>
1059
1060         <section xml:id="motivation.priority_queue.binary_heap">
1061           <info><title>Binary Heaps</title></info>
1062
1063
1064           <para>
1065             Binary heaps are one of the most useful underlying
1066             data structures for priority queues. They are very efficient in
1067             terms of memory (since they don't require per-value structure
1068             metadata), and have the best amortized <function>push</function> and
1069             <function>pop</function> performance for primitive types like
1070             <type>int</type>.
1071           </para>
1072
1073           <para>
1074             The standard library's <classname>priority_queue</classname>
1075             implements this data structure as an adapter over a sequence,
1076             typically
1077             <classname>std::vector</classname>
1078             or <classname>std::deque</classname>, which correspond to labels
1079             A1 and A2 respectively in the graphic above.
1080           </para>
1081
1082           <para>
1083             This is indeed an elegant example of the adapter concept and
1084             the algorithm/container/iterator decomposition. (See <xref linkend="biblio.nelson96stlpq"/>). There are
1085             several reasons why a binary-heap priority queue
1086             may be better implemented as a container instead of a
1087             sequence adapter:
1088           </para>
1089
1090           <orderedlist>
1091             <listitem>
1092               <para>
1093                 <classname>std::priority_queue</classname> cannot erase values
1094                 from its adapted sequence (irrespective of the sequence
1095                 type). This means that the memory use of
1096                 an <classname>std::priority_queue</classname> object is always
1097                 proportional to the maximal number of values it ever contained,
1098                 and not to the number of values that it currently
1099                 contains. (See <filename>performance/priority_queue_text_pop_mem_usage.cc</filename>.)
1100                 This implementation of binary heaps acts very differently than
1101                 other underlying data structures (See also pairing heaps).
1102               </para>
1103             </listitem>
1104
1105             <listitem>
1106               <para>
1107                 Some combinations of adapted sequences and value types
1108                 are very inefficient or just don't make sense. If one uses
1109                 <classname>std::priority_queue&lt;std::vector&lt;std::string&gt;
1110                 &gt; &gt;</classname>, for example, then not only will each
1111                 operation perform a logarithmic number of
1112                 <classname>std::string</classname> assignments, but, furthermore, any
1113                 operation (including <function>pop</function>) can render the container
1114                 useless due to exceptions. Conversely, if one uses
1115                 <classname>std::priority_queue&lt;std::deque&lt;int&gt; &gt;
1116                 &gt;</classname>, then each operation uses incurs a logarithmic
1117                 number of indirect accesses (through pointers) unnecessarily.
1118                 It might be better to let the container make a conservative
1119                 deduction whether to use the structure in the graphic above, labels A1 or A2.
1120               </para>
1121             </listitem>
1122
1123             <listitem>
1124               <para>
1125                 There does not seem to be a systematic way to determine
1126                 what exactly can be done with the priority queue.
1127               </para>
1128               <orderedlist>
1129                 <listitem>
1130                   <para>
1131                     If <classname>p</classname> is a priority queue adapting an
1132                     <classname>std::vector</classname>, then it is possible to iterate over
1133                     all values by using <function>&amp;p.top()</function> and
1134                     <function>&amp;p.top() + p.size()</function>, but this will not work
1135                     if <varname>p</varname> is adapting an <classname>std::deque</classname>; in any
1136                     case, one cannot use <classname>p.begin()</classname> and
1137                     <classname>p.end()</classname>. If a different sequence is adapted, it
1138                     is even more difficult to determine what can be
1139                     done.
1140                   </para>
1141                 </listitem>
1142
1143                 <listitem>
1144                   <para>
1145                     If <varname>p</varname> is a priority queue adapting an
1146                     <classname>std::deque</classname>, then the reference return by
1147                   </para>
1148                   <programlisting>
1149                     p.top()
1150                   </programlisting>
1151                   <para>
1152                     will remain valid until it is popped,
1153                     but if <varname>p</varname> adapts an <classname>std::vector</classname>, the
1154                     next <function>push</function> will invalidate it. If a different
1155                     sequence is adapted, it is even more difficult to
1156                     determine what can be done.
1157                   </para>
1158                 </listitem>
1159               </orderedlist>
1160             </listitem>
1161
1162             <listitem>
1163               <para>
1164                 Sequence-based binary heaps can still implement
1165                 linear-time <function>erase</function> and <function>modify</function> operations.
1166                 This means that if one needs to erase a small
1167                 (say logarithmic) number of values, then one might still
1168                 choose this underlying data structure. Using
1169                 <classname>std::priority_queue</classname>, however, this will generally
1170                 change the order of growth of the entire sequence of
1171                 operations.
1172               </para>
1173             </listitem>
1174           </orderedlist>
1175
1176         </section>
1177       </section>
1178     </section> <!-- goals/motivation -->
1179   </section> <!-- intro -->
1180
1181   <!-- S02: Using -->
1182   <section xml:id="containers.pbds.using">
1183     <info><title>Using</title></info>
1184     <?dbhtml filename="policy_data_structures_using.html"?>
1185
1186     <section xml:id="pbds.using.prereq">
1187       <info><title>Prerequisites</title></info>
1188
1189       <para>The library contains only header files, and does not require any
1190       other libraries except the standard C++ library . All classes are
1191       defined in namespace <code>__gnu_pbds</code>. The library internally
1192       uses macros beginning with <code>PB_DS</code>, but
1193       <code>#undef</code>s anything it <code>#define</code>s (except for
1194       header guards). Compiling the library in an environment where macros
1195       beginning in <code>PB_DS</code> are defined, may yield unpredictable
1196       results in compilation, execution, or both.</para>
1197
1198       <para>
1199         Further dependencies are necessary to create the visual output for the
1200         performance tests. To create these graphs, two additional packages
1201         will be needed: <command>pychart</command> and <command>Beautiful
1202         Soup</command>.
1203       </para>
1204     </section>
1205
1206     <section xml:id="pbds.using.organization">
1207       <info><title>Organization</title></info>
1208
1209       <para>
1210         The various data structures are organized as follows.
1211       </para>
1212
1213       <itemizedlist>
1214         <listitem>
1215           <para>
1216             Branch-Based
1217           </para>
1218
1219           <itemizedlist>
1220             <listitem>
1221               <para>
1222                 <classname>basic_branch</classname>
1223                 is an abstract base class for branched-based
1224                 associative-containers
1225               </para>
1226             </listitem>
1227
1228             <listitem>
1229               <para>
1230                 <classname>tree</classname>
1231                 is a concrete base class for tree-based
1232                 associative-containers
1233               </para>
1234             </listitem>
1235
1236             <listitem>
1237               <para>
1238                 <classname>trie</classname>
1239                 is a concrete base class trie-based
1240                 associative-containers
1241               </para>
1242             </listitem>
1243           </itemizedlist>
1244         </listitem>
1245
1246         <listitem>
1247           <para>
1248             Hash-Based
1249           </para>
1250           <itemizedlist>
1251             <listitem>
1252               <para>
1253                 <classname>basic_hash_table</classname>
1254                 is an abstract base class for hash-based
1255                 associative-containers
1256               </para>
1257             </listitem>
1258
1259             <listitem>
1260               <para>
1261                 <classname>cc_hash_table</classname>
1262                 is a concrete collision-chaining hash-based
1263                 associative-containers
1264               </para>
1265             </listitem>
1266
1267             <listitem>
1268               <para>
1269                 <classname>gp_hash_table</classname>
1270                 is a concrete (general) probing hash-based
1271                 associative-containers
1272               </para>
1273             </listitem>
1274           </itemizedlist>
1275         </listitem>
1276
1277         <listitem>
1278           <para>
1279             List-Based
1280           </para>
1281           <itemizedlist>
1282             <listitem>
1283               <para>
1284                 <classname>list_update</classname>
1285                 list-based update-policy associative container
1286               </para>
1287             </listitem>
1288           </itemizedlist>
1289         </listitem>
1290         <listitem>
1291           <para>
1292             Heap-Based
1293           </para>
1294           <itemizedlist>
1295             <listitem>
1296               <para>
1297                 <classname>priority_queue</classname>
1298                 A priority queue.
1299               </para>
1300             </listitem>
1301           </itemizedlist>
1302         </listitem>
1303       </itemizedlist>
1304
1305       <para>
1306         The hierarchy is composed naturally so that commonality is
1307         captured by base classes. Thus <function>operator[]</function>
1308         is defined at the base of any hierarchy, since all derived
1309         containers support it. Conversely <function>split</function> is
1310         defined in <classname>basic_branch</classname>, since only
1311         tree-like containers support it.
1312       </para>
1313
1314       <para>
1315         In addition, there are the following diagnostics classes,
1316         used to report errors specific to this library's data
1317         structures.
1318       </para>
1319
1320       <figure>
1321         <title>Exception Hierarchy</title>
1322         <mediaobject>
1323           <imageobject>
1324             <imagedata align="center" format="PDF" scale="75"
1325                        fileref="../images/pbds_exception_hierarchy.pdf"/>
1326           </imageobject>
1327           <imageobject>
1328             <imagedata align="center" format="PNG" scale="100"
1329                        fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_exception_hierarchy.png"/>
1330           </imageobject>
1331           <textobject>
1332             <phrase>Exception Hierarchy</phrase>
1333           </textobject>
1334         </mediaobject>
1335       </figure>
1336
1337     </section>
1338
1339     <section xml:id="pbds.using.tutorial">
1340       <info><title>Tutorial</title></info>
1341
1342       <section xml:id="pbds.using.tutorial.basic">
1343         <info><title>Basic Use</title></info>
1344
1345         <para>
1346           For the most part, the policy-based containers containers in
1347           namespace <literal>__gnu_pbds</literal> have the same interface as
1348           the equivalent containers in the standard C++ library, except for
1349           the names used for the container classes themselves. For example,
1350           this shows basic operations on a collision-chaining hash-based
1351           container:
1352         </para>
1353         <programlisting>
1354           #include &lt;ext/pb_ds/assoc_container.h&gt;
1355
1356           int main()
1357           {
1358           __gnu_pbds::cc_hash_table&lt;int, char&gt; c;
1359           c[2] = 'b';
1360           assert(c.find(1) == c.end());
1361           };
1362         </programlisting>
1363
1364         <para>
1365           The container is called
1366           <classname>__gnu_pbds::cc_hash_table</classname> instead of
1367           <classname>std::unordered_map</classname>, since <quote>unordered
1368           map</quote> does not necessarily mean a hash-based map as implied by
1369           the C++ library (C++0x or TR1). For example, list-based associative
1370           containers, which are very useful for the construction of
1371           "multimaps," are also unordered.
1372         </para>
1373
1374         <para>This snippet shows a red-black tree based container:</para>
1375
1376         <programlisting>
1377           #include &lt;ext/pb_ds/assoc_container.h&gt;
1378
1379           int main()
1380           {
1381           __gnu_pbds::tree&lt;int, char&gt; c;
1382           c[2] = 'b';
1383           assert(c.find(2) != c.end());
1384           };
1385         </programlisting>
1386
1387         <para>The container is called <classname>tree</classname> instead of
1388         <classname>map</classname> since the underlying data structures are
1389         being named with specificity.
1390         </para>
1391
1392         <para>
1393           The member function naming convention is to strive to be the same as
1394           the equivalent member functions in other C++ standard library
1395           containers. The familiar methods are unchanged:
1396           <function>begin</function>, <function>end</function>,
1397           <function>size</function>, <function>empty</function>, and
1398           <function>clear</function>.
1399         </para>
1400
1401         <para>
1402           This isn't to say that things are exactly as one would expect, given
1403           the container requirments and interfaces in the C++ standard.
1404         </para>
1405
1406         <para>
1407           The names of containers' policies and policy accessors are
1408           different then the usual. For example, if <type>hash_type</type> is
1409         some type of hash-based container, then</para>
1410
1411         <programlisting>
1412           hash_type::hash_fn
1413         </programlisting>
1414
1415         <para>
1416           gives the type of its hash functor, and if <varname>obj</varname> is
1417           some hash-based container object, then
1418         </para>
1419
1420         <programlisting>
1421           obj.get_hash_fn()
1422         </programlisting>
1423
1424         <para>will return a reference to its hash-functor object.</para>
1425
1426
1427         <para>
1428           Similarly, if <type>tree_type</type> is some type of tree-based
1429           container, then
1430         </para>
1431
1432         <programlisting>
1433           tree_type::cmp_fn
1434         </programlisting>
1435
1436         <para>
1437           gives the type of its comparison functor, and if
1438           <varname>obj</varname> is some tree-based container object,
1439           then
1440         </para>
1441
1442         <programlisting>
1443           obj.get_cmp_fn()
1444         </programlisting>
1445
1446         <para>will return a reference to its comparison-functor object.</para>
1447
1448         <para>
1449           It would be nice to give names consistent with those in the existing
1450           C++ standard (inclusive of TR1). Unfortunately, these standard
1451           containers don't consistently name types and methods. For example,
1452           <classname>std::tr1::unordered_map</classname> uses
1453           <type>hasher</type> for the hash functor, but
1454           <classname>std::map</classname> uses <type>key_compare</type> for
1455           the comparison functor. Also, we could not find an accessor for
1456           <classname>std::tr1::unordered_map</classname>'s hash functor, but
1457           <classname>std::map</classname> uses <classname>compare</classname>
1458           for accessing the comparison functor.
1459         </para>
1460
1461         <para>
1462           Instead, <literal>__gnu_pbds</literal> attempts to be internally
1463           consistent, and uses standard-derived terminology if possible.
1464         </para>
1465
1466         <para>
1467           Another source of difference is in scope:
1468           <literal>__gnu_pbds</literal> contains more types of associative
1469           containers than the standard C++ library, and more opportunities
1470           to configure these new containers, since different types of
1471           associative containers are useful in different settings.
1472         </para>
1473
1474         <para>
1475           Namespace <literal>__gnu_pbds</literal> contains different classes for
1476           hash-based containers, tree-based containers, trie-based containers,
1477           and list-based containers.
1478         </para>
1479
1480         <para>
1481           Since associative containers share parts of their interface, they
1482           are organized as a class hierarchy.
1483         </para>
1484
1485         <para>Each type or method is defined in the most-common ancestor
1486         in which it makes sense.
1487         </para>
1488
1489         <para>For example, all associative containers support iteration
1490         expressed in the following form:
1491         </para>
1492
1493         <programlisting>
1494           const_iterator
1495           begin() const;
1496
1497           iterator
1498           begin();
1499
1500           const_iterator
1501           end() const;
1502
1503           iterator
1504           end();
1505         </programlisting>
1506
1507         <para>
1508           But not all containers contain or use hash functors. Yet, both
1509           collision-chaining and (general) probing hash-based associative
1510           containers have a hash functor, so
1511           <classname>basic_hash_table</classname> contains the interface:
1512         </para>
1513
1514         <programlisting>
1515           const hash_fn&amp;
1516           get_hash_fn() const;
1517
1518           hash_fn&amp;
1519           get_hash_fn();
1520         </programlisting>
1521
1522         <para>
1523           so all hash-based associative containers inherit the same
1524           hash-functor accessor methods.
1525         </para>
1526
1527       </section> <!--basic use -->
1528
1529       <section xml:id="pbds.using.tutorial.configuring">
1530         <info>
1531           <title>
1532             Configuring via Template Parameters
1533           </title>
1534         </info>
1535
1536         <para>
1537           In general, each of this library's containers is
1538           parametrized by more policies than those of the standard library. For
1539           example, the standard hash-based container is parametrized as
1540           follows:
1541         </para>
1542         <programlisting>
1543           template&lt;typename Key, typename Mapped, typename Hash,
1544           typename Pred, typename Allocator, bool Cache_Hashe_Code&gt;
1545           class unordered_map;
1546         </programlisting>
1547
1548         <para>
1549           and so can be configured by key type, mapped type, a functor
1550           that translates keys to unsigned integral types, an equivalence
1551           predicate, an allocator, and an indicator whether to store hash
1552           values with each entry. this library's collision-chaining
1553           hash-based container is parametrized as
1554         </para>
1555         <programlisting>
1556           template&lt;typename Key, typename Mapped, typename Hash_Fn,
1557           typename Eq_Fn, typename Comb_Hash_Fn,
1558           typename Resize_Policy, bool Store_Hash
1559           typename Allocator&gt;
1560           class cc_hash_table;
1561         </programlisting>
1562
1563         <para>
1564           and so can be configured by the first four types of
1565           <classname>std::tr1::unordered_map</classname>, then a
1566           policy for translating the key-hash result into a position
1567           within the table, then a policy by which the table resizes,
1568           an indicator whether to store hash values with each entry,
1569           and an allocator (which is typically the last template
1570           parameter in standard containers).
1571         </para>
1572
1573         <para>
1574           Nearly all policy parameters have default values, so this
1575           need not be considered for casual use. It is important to
1576           note, however, that hash-based containers' policies can
1577           dramatically alter their performance in different settings,
1578           and that tree-based containers' policies can make them
1579           useful for other purposes than just look-up.
1580         </para>
1581
1582
1583         <para>As opposed to associative containers, priority queues have
1584         relatively few configuration options. The priority queue is
1585         parametrized as follows:</para>
1586         <programlisting>
1587           template&lt;typename Value_Type, typename Cmp_Fn,typename Tag,
1588           typename Allocator&gt;
1589           class priority_queue;
1590         </programlisting>
1591
1592         <para>The <classname>Value_Type</classname>, <classname>Cmp_Fn</classname>, and
1593         <classname>Allocator</classname> parameters are the container's value type,
1594         comparison-functor type, and allocator type, respectively;
1595         these are very similar to the standard's priority queue. The
1596         <classname>Tag</classname> parameter is different: there are a number of
1597         pre-defined tag types corresponding to binary heaps, binomial
1598         heaps, etc., and <classname>Tag</classname> should be instantiated
1599         by one of them.</para>
1600
1601         <para>Note that as opposed to the
1602         <classname>std::priority_queue</classname>,
1603         <classname>__gnu_pbds::priority_queue</classname> is not a
1604         sequence-adapter; it is a regular container.</para>
1605
1606       </section>
1607
1608       <section xml:id="pbds.using.tutorial.traits">
1609         <info>
1610           <title>
1611             Querying Container Attributes
1612           </title>
1613         </info>
1614         <para></para>
1615
1616         <para>A containers underlying data structure
1617         affect their performance; Unfortunately, they can also affect
1618         their interface. When manipulating generically associative
1619         containers, it is often useful to be able to statically
1620         determine what they can support and what the cannot.
1621         </para>
1622
1623         <para>Happily, the standard provides a good solution to a similar
1624         problem - that of the different behavior of iterators. If
1625         <classname>It</classname> is an iterator, then
1626         </para>
1627         <programlisting>
1628           typename std::iterator_traits&lt;It&gt;::iterator_category
1629         </programlisting>
1630
1631         <para>is one of a small number of pre-defined tag classes, and
1632         </para>
1633         <programlisting>
1634           typename std::iterator_traits&lt;It&gt;::value_type
1635         </programlisting>
1636
1637         <para>is the value type to which the iterator "points".</para>
1638
1639         <para>
1640           Similarly, in this library, if <type>C</type> is a
1641           container, then <classname>container_traits</classname> is a
1642           trait class that stores information about the kind of
1643           container that is implemented.
1644         </para>
1645         <programlisting>
1646           typename container_traits&lt;C&gt;::container_category
1647         </programlisting>
1648         <para>
1649           is one of a small number of predefined tag structures that
1650           uniquely identifies the type of underlying data structure.
1651         </para>
1652
1653         <para>In most cases, however, the exact underlying data
1654         structure is not really important, but what is important is
1655         one of its other attributes: whether it guarantees storing
1656         elements by key order, for example. For this one can
1657         use</para>
1658         <programlisting>
1659           typename container_traits&lt;C&gt;::order_preserving
1660         </programlisting>
1661         <para>
1662           Also,
1663         </para>
1664         <programlisting>
1665           typename container_traits&lt;C&gt;::invalidation_guarantee
1666         </programlisting>
1667
1668         <para>is the container's invalidation guarantee. Invalidation
1669         guarantees are especially important regarding priority queues,
1670         since in this library's design, iterators are practically the
1671         only way to manipulate them.</para>
1672       </section>
1673
1674       <section xml:id="pbds.using.tutorial.point_range_iteration">
1675         <info>
1676           <title>
1677             Point and Range Iteration
1678           </title>
1679         </info>
1680         <para></para>
1681
1682         <para>This library differentiates between two types of methods
1683         and iterators: point-type, and range-type. For example,
1684         <function>find</function> and <function>insert</function> are point-type methods, since
1685         they each deal with a specific element; their returned
1686         iterators are point-type iterators. <function>begin</function> and
1687         <function>end</function> are range-type methods, since they are not used to
1688         find a specific element, but rather to go over all elements in
1689         a container object; their returned iterators are range-type
1690         iterators.
1691         </para>
1692
1693         <para>Most containers store elements in an order that is
1694         determined by their interface. Correspondingly, it is fine that
1695         their point-type iterators are synonymous with their range-type
1696         iterators. For example, in the following snippet
1697         </para>
1698         <programlisting>
1699           std::for_each(c.find(1), c.find(5), foo);
1700         </programlisting>
1701         <para>
1702           two point-type iterators (returned by <function>find</function>) are used
1703           for a range-type purpose - going over all elements whose key is
1704           between 1 and 5.
1705         </para>
1706
1707         <para>
1708           Conversely, the above snippet makes no sense for
1709           self-organizing containers - ones that order (and reorder)
1710           their elements by implementation. It would be nice to have a
1711           uniform iterator system that would allow the above snippet to
1712           compile only if it made sense.
1713         </para>
1714
1715         <para>
1716           This could trivially be done by specializing
1717           <function>std::for_each</function> for the case of iterators returned by
1718           <classname>std::tr1::unordered_map</classname>, but this would only solve the
1719           problem for one algorithm and one container. Fundamentally, the
1720           problem is that one can loop using a self-organizing
1721           container's point-type iterators.
1722         </para>
1723
1724         <para>
1725           This library's containers define two families of
1726           iterators: <type>point_const_iterator</type> and
1727           <type>point_iterator</type> are the iterator types returned by
1728           point-type methods; <type>const_iterator</type> and
1729           <type>iterator</type> are the iterator types returned by range-type
1730           methods.
1731         </para>
1732         <programlisting>
1733           class &lt;- some container -&gt;
1734           {
1735           public:
1736           ...
1737
1738           typedef &lt;- something -&gt; const_iterator;
1739
1740           typedef &lt;- something -&gt; iterator;
1741
1742           typedef &lt;- something -&gt; point_const_iterator;
1743
1744           typedef &lt;- something -&gt; point_iterator;
1745
1746           ...
1747
1748           public:
1749           ...
1750
1751           const_iterator begin () const;
1752
1753           iterator begin();
1754
1755           point_const_iterator find(...) const;
1756
1757           point_iterator find(...);
1758           };
1759         </programlisting>
1760
1761         <para>For
1762         containers whose interface defines sequence order , it
1763         is very simple: point-type and range-type iterators are exactly
1764         the same, which means that the above snippet will compile if it
1765         is used for an order-preserving associative container.
1766         </para>
1767
1768         <para>
1769           For self-organizing containers, however, (hash-based
1770           containers as a special example), the preceding snippet will
1771           not compile, because their point-type iterators do not support
1772           <function>operator++</function>.
1773         </para>
1774
1775         <para>In any case, both for order-preserving and self-organizing
1776         containers, the following snippet will compile:
1777         </para>
1778         <programlisting>
1779           typename Cntnr::point_iterator it = c.find(2);
1780         </programlisting>
1781
1782         <para>
1783           because a range-type iterator can always be converted to a
1784           point-type iterator.
1785         </para>
1786
1787         <para>Distingushing between iterator types also
1788         raises the point that a container's iterators might have
1789         different invalidation rules concerning their de-referencing
1790         abilities and movement abilities. This now corresponds exactly
1791         to the question of whether point-type and range-type iterators
1792         are valid. As explained above, <classname>container_traits</classname> allows
1793         querying a container for its data structure attributes. The
1794         iterator-invalidation guarantees are certainly a property of
1795         the underlying data structure, and so
1796         </para>
1797         <programlisting>
1798           container_traits&lt;C&gt;::invalidation_guarantee
1799         </programlisting>
1800
1801         <para>
1802           gives one of three pre-determined types that answer this
1803           query.
1804         </para>
1805
1806       </section>
1807     </section> <!-- tutorial -->
1808
1809     <section xml:id="pbds.using.examples">
1810       <info><title>Examples</title></info>
1811       <para>
1812         Additional code examples are provided in the source
1813         distribution, as part of the regression and performance
1814         testsuite.
1815       </para>
1816
1817       <section xml:id="pbds.using.examples.basic">
1818         <info><title>Intermediate Use</title></info>
1819
1820         <itemizedlist>
1821           <listitem>
1822             <para>
1823               Basic use of maps:
1824               <filename>basic_map.cc</filename>
1825             </para>
1826           </listitem>
1827
1828           <listitem>
1829             <para>
1830               Basic use of sets:
1831               <filename>basic_set.cc</filename>
1832             </para>
1833           </listitem>
1834
1835           <listitem>
1836             <para>
1837               Conditionally erasing values from an associative container object:
1838               <filename>erase_if.cc</filename>
1839             </para>
1840           </listitem>
1841
1842           <listitem>
1843             <para>
1844               Basic use of multimaps:
1845               <filename>basic_multimap.cc</filename>
1846             </para>
1847           </listitem>
1848
1849           <listitem>
1850             <para>
1851               Basic use of multisets:
1852               <filename>basic_multiset.cc</filename>
1853             </para>
1854           </listitem>
1855
1856           <listitem>
1857             <para>
1858               Basic use of priority queues:
1859               <filename>basic_priority_queue.cc</filename>
1860             </para>
1861           </listitem>
1862
1863           <listitem>
1864             <para>
1865               Splitting and joining priority queues:
1866               <filename>priority_queue_split_join.cc</filename>
1867             </para>
1868           </listitem>
1869
1870           <listitem>
1871             <para>
1872               Conditionally erasing values from a priority queue:
1873               <filename>priority_queue_erase_if.cc</filename>
1874             </para>
1875           </listitem>
1876         </itemizedlist>
1877
1878       </section>
1879
1880       <section xml:id="pbds.using.examples.query">
1881         <info><title>Querying with <classname>container_traits</classname> </title></info>
1882         <itemizedlist>
1883           <listitem>
1884             <para>
1885               Using <classname>container_traits</classname> to query
1886               about underlying data structure behavior:
1887               <filename>assoc_container_traits.cc</filename>
1888             </para>
1889           </listitem>
1890
1891           <listitem>
1892             <para>
1893               A non-compiling example showing wrong use of finding keys in
1894               hash-based containers: <filename>hash_find_neg.cc</filename>
1895             </para>
1896           </listitem>
1897           <listitem>
1898             <para>
1899               Using <classname>container_traits</classname>
1900               to query about underlying data structure behavior:
1901               <filename>priority_queue_container_traits.cc</filename>
1902             </para>
1903           </listitem>
1904
1905         </itemizedlist>
1906
1907       </section>
1908
1909       <section xml:id="pbds.using.examples.container">
1910         <info><title>By Container Method</title></info>
1911         <para></para>
1912
1913         <section xml:id="pbds.using.examples.container.hash">
1914           <info><title>Hash-Based</title></info>
1915
1916           <section xml:id="pbds.using.examples.container.hash.resize">
1917             <info><title>size Related</title></info>
1918
1919             <itemizedlist>
1920               <listitem>
1921                 <para>
1922                   Setting the initial size of a hash-based container
1923                   object:
1924                   <filename>hash_initial_size.cc</filename>
1925                 </para>
1926               </listitem>
1927
1928               <listitem>
1929                 <para>
1930                   A non-compiling example showing how not to resize a
1931                   hash-based container object:
1932                   <filename>hash_resize_neg.cc</filename>
1933                 </para>
1934               </listitem>
1935
1936               <listitem>
1937                 <para>
1938                   Resizing the size of a hash-based container object:
1939                   <filename>hash_resize.cc</filename>
1940                 </para>
1941               </listitem>
1942
1943               <listitem>
1944                 <para>
1945                   Showing an illegal resize of a hash-based container
1946                   object:
1947                   <filename>hash_illegal_resize.cc</filename>
1948                 </para>
1949               </listitem>
1950
1951               <listitem>
1952                 <para>
1953                   Changing the load factors of a hash-based container
1954                   object: <filename>hash_load_set_change.cc</filename>
1955                 </para>
1956               </listitem>
1957             </itemizedlist>
1958           </section>
1959
1960           <section xml:id="pbds.using.examples.container.hash.hashor">
1961             <info><title>Hashing Function Related</title></info>
1962             <para></para>
1963
1964             <itemizedlist>
1965               <listitem>
1966                 <para>
1967                   Using a modulo range-hashing function for the case of an
1968                   unknown skewed key distribution:
1969                   <filename>hash_mod.cc</filename>
1970                 </para>
1971               </listitem>
1972
1973               <listitem>
1974                 <para>
1975                   Writing a range-hashing functor for the case of a known
1976                   skewed key distribution:
1977                   <filename>shift_mask.cc</filename>
1978                 </para>
1979               </listitem>
1980
1981               <listitem>
1982                 <para>
1983                   Storing the hash value along with each key:
1984                   <filename>store_hash.cc</filename>
1985                 </para>
1986               </listitem>
1987
1988               <listitem>
1989                 <para>
1990                   Writing a ranged-hash functor:
1991                   <filename>ranged_hash.cc</filename>
1992                 </para>
1993               </listitem>
1994             </itemizedlist>
1995
1996           </section>
1997
1998         </section>
1999
2000         <section xml:id="pbds.using.examples.container.branch">
2001           <info><title>Branch-Based</title></info>
2002
2003
2004           <section xml:id="pbds.using.examples.container.branch.split">
2005             <info><title>split or join Related</title></info>
2006
2007             <itemizedlist>
2008               <listitem>
2009                 <para>
2010                   Joining two tree-based container objects:
2011                   <filename>tree_join.cc</filename>
2012                 </para>
2013               </listitem>
2014
2015               <listitem>
2016                 <para>
2017                   Splitting a PATRICIA trie container object:
2018                   <filename>trie_split.cc</filename>
2019                 </para>
2020               </listitem>
2021
2022               <listitem>
2023                 <para>
2024                   Order statistics while joining two tree-based container
2025                   objects:
2026                   <filename>tree_order_statistics_join.cc</filename>
2027                 </para>
2028               </listitem>
2029             </itemizedlist>
2030
2031           </section>
2032
2033           <section xml:id="pbds.using.examples.container.branch.invariants">
2034             <info><title>Node Invariants</title></info>
2035
2036             <itemizedlist>
2037               <listitem>
2038                 <para>
2039                   Using trees for order statistics:
2040                   <filename>tree_order_statistics.cc</filename>
2041                 </para>
2042               </listitem>
2043
2044               <listitem>
2045                 <para>
2046                   Augmenting trees to support operations on line
2047                   intervals:
2048                   <filename>tree_intervals.cc</filename>
2049                 </para>
2050               </listitem>
2051             </itemizedlist>
2052
2053           </section>
2054
2055           <section xml:id="pbds.using.examples.container.branch.trie">
2056             <info><title>trie</title></info>
2057             <itemizedlist>
2058               <listitem>
2059                 <para>
2060                   Using a PATRICIA trie for DNA strings:
2061                   <filename>trie_dna.cc</filename>
2062                 </para>
2063               </listitem>
2064
2065               <listitem>
2066                 <para>
2067                   Using a PATRICIA
2068                   trie for finding all entries whose key matches a given prefix:
2069                   <filename>trie_prefix_search.cc</filename>
2070                 </para>
2071               </listitem>
2072             </itemizedlist>
2073
2074           </section>
2075
2076         </section>
2077
2078         <section xml:id="pbds.using.examples.container.priority_queue">
2079           <info><title>Priority Queues</title></info>
2080           <itemizedlist>
2081             <listitem>
2082               <para>
2083                 Cross referencing an associative container and a priority
2084                 queue: <filename>priority_queue_xref.cc</filename>
2085               </para>
2086             </listitem>
2087
2088             <listitem>
2089               <para>
2090                 Cross referencing a vector and a priority queue using a
2091                 very simple version of Dijkstra's shortest path
2092                 algorithm:
2093                 <filename>priority_queue_dijkstra.cc</filename>
2094               </para>
2095             </listitem>
2096           </itemizedlist>
2097
2098         </section>
2099
2100
2101       </section>
2102
2103     </section>
2104
2105   </section> <!-- using -->
2106
2107   <!-- S03: Design -->
2108
2109
2110 <section xml:id="containers.pbds.design">
2111   <info><title>Design</title></info>
2112   <?dbhtml filename="policy_data_structures_design.html"?>
2113   <para></para>
2114
2115   <section xml:id="pbds.design.concepts">
2116     <info><title>Concepts</title></info>
2117
2118     <section xml:id="pbds.design.concepts.null_type">
2119       <info><title>Null Policy Classes</title></info>
2120
2121       <para>
2122         Associative containers are typically parametrized by various
2123         policies. For example, a hash-based associative container is
2124         parametrized by a hash-functor, transforming each key into an
2125         non-negative numerical type. Each such value is then further mapped
2126         into a position within the table. The mapping of a key into a
2127         position within the table is therefore a two-step process.
2128       </para>
2129
2130       <para>
2131         In some cases, instantiations are redundant. For example, when the
2132         keys are integers, it is possible to use a redundant hash policy,
2133         which transforms each key into its value.
2134       </para>
2135
2136       <para>
2137         In some other cases, these policies are irrelevant.  For example, a
2138         hash-based associative container might transform keys into positions
2139         within a table by a different method than the two-step method
2140         described above. In such a case, the hash functor is simply
2141         irrelevant.
2142       </para>
2143
2144       <para>
2145         When a policy is either redundant or irrelevant, it can be replaced
2146         by <classname>null_type</classname>.
2147       </para>
2148
2149       <para>
2150         For example, a <emphasis>set</emphasis> is an associative
2151         container with one of its template parameters (the one for the
2152         mapped type) replaced with <classname>null_type</classname>. Other
2153         places simplifications are made possible with this technique
2154         include node updates in tree and trie data structures, and hash
2155         and probe functions for hash data structures.
2156       </para>
2157     </section>
2158
2159     <section xml:id="pbds.design.concepts.associative_semantics">
2160       <info><title>Map and Set Semantics</title></info>
2161
2162       <section xml:id="concepts.associative_semantics.set_vs_map">
2163         <info>
2164           <title>
2165             Distinguishing Between Maps and Sets
2166           </title>
2167         </info>
2168
2169         <para>
2170           Anyone familiar with the standard knows that there are four kinds
2171           of associative containers: maps, sets, multimaps, and
2172           multisets. The map datatype associates each key to
2173           some data.
2174         </para>
2175
2176         <para>
2177           Sets are associative containers that simply store keys -
2178           they do not map them to anything. In the standard, each map class
2179           has a corresponding set class. E.g.,
2180           <classname>std::map&lt;int, char&gt;</classname> maps each
2181           <classname>int</classname> to a <classname>char</classname>, but
2182           <classname>std::set&lt;int, char&gt;</classname> simply stores
2183           <classname>int</classname>s. In this library, however, there are no
2184           distinct classes for maps and sets. Instead, an associative
2185           container's <classname>Mapped</classname> template parameter is a policy: if
2186           it is instantiated by <classname>null_type</classname>, then it
2187           is a "set"; otherwise, it is a "map". E.g.,
2188         </para>
2189         <programlisting>
2190           cc_hash_table&lt;int, char&gt;
2191         </programlisting>
2192         <para>
2193           is a "map" mapping each <type>int</type> value to a <type>
2194           char</type>, but
2195         </para>
2196         <programlisting>
2197           cc_hash_table&lt;int, null_type&gt;
2198         </programlisting>
2199         <para>
2200           is a type that uniquely stores <type>int</type> values.
2201         </para>
2202         <para>Once the <classname>Mapped</classname> template parameter is instantiated
2203         by <classname>null_type</classname>, then
2204         the "set" acts very similarly to the standard's sets - it does not
2205         map each key to a distinct <classname>null_type</classname> object. Also,
2206         , the container's <type>value_type</type> is essentially
2207         its <type>key_type</type> - just as with the standard's sets
2208         .</para>
2209
2210         <para>
2211           The standard's multimaps and multisets allow, respectively,
2212           non-uniquely mapping keys and non-uniquely storing keys. As
2213           discussed, the
2214           reasons why this might be necessary are 1) that a key might be
2215           decomposed into a primary key and a secondary key, 2) that a
2216           key might appear more than once, or 3) any arbitrary
2217           combination of 1)s and 2)s. Correspondingly,
2218           one should use 1) "maps" mapping primary keys to secondary
2219           keys, 2) "maps" mapping keys to size types, or 3) any arbitrary
2220           combination of 1)s and 2)s. Thus, for example, an
2221           <classname>std::multiset&lt;int&gt;</classname> might be used to store
2222           multiple instances of integers, but using this library's
2223           containers, one might use
2224         </para>
2225         <programlisting>
2226           tree&lt;int, size_t&gt;
2227         </programlisting>
2228
2229         <para>
2230           i.e., a <classname>map</classname> of <type>int</type>s to
2231           <type>size_t</type>s.
2232         </para>
2233         <para>
2234           These "multimaps" and "multisets" might be confusing to
2235           anyone familiar with the standard's <classname>std::multimap</classname> and
2236           <classname>std::multiset</classname>, because there is no clear
2237           correspondence between the two. For example, in some cases
2238           where one uses <classname>std::multiset</classname> in the standard, one might use
2239           in this library a "multimap" of "multisets" - i.e., a
2240           container that maps primary keys each to an associative
2241           container that maps each secondary key to the number of times
2242           it occurs.
2243         </para>
2244
2245         <para>
2246           When one uses a "multimap," one should choose with care the
2247           type of container used for secondary keys.
2248         </para>
2249       </section> <!-- map vs set -->
2250
2251
2252       <section xml:id="concepts.associative_semantics.multi">
2253         <info><title>Alternatives to <classname>std::multiset</classname> and <classname>std::multimap</classname></title></info>
2254
2255         <para>
2256           Brace onself: this library does not contain containers like
2257           <classname>std::multimap</classname> or
2258           <classname>std::multiset</classname>. Instead, these data
2259           structures can be synthesized via manipulation of the
2260           <classname>Mapped</classname> template parameter.
2261         </para>
2262         <para>
2263           One maps the unique part of a key - the primary key, into an
2264           associative-container of the (originally) non-unique parts of
2265           the key - the secondary key. A primary associative-container
2266           is an associative container of primary keys; a secondary
2267           associative-container is an associative container of
2268           secondary keys.
2269         </para>
2270
2271         <para>
2272           Stepping back a bit, and starting in from the beginning.
2273         </para>
2274
2275
2276         <para>
2277           Maps (or sets) allow mapping (or storing) unique-key values.
2278           The standard library also supplies associative containers which
2279           map (or store) multiple values with equivalent keys:
2280           <classname>std::multimap</classname>, <classname>std::multiset</classname>,
2281           <classname>std::tr1::unordered_multimap</classname>, and
2282           <classname>unordered_multiset</classname>. We first discuss how these might
2283           be used, then why we think it is best to avoid them.
2284         </para>
2285
2286         <para>
2287           Suppose one builds a simple bank-account application that
2288           records for each client (identified by an <classname>std::string</classname>)
2289           and account-id (marked by an <type>unsigned long</type>) -
2290           the balance in the account (described by a
2291           <type>float</type>). Suppose further that ordering this
2292           information is not useful, so a hash-based container is
2293           preferable to a tree based container. Then one can use
2294         </para>
2295
2296         <programlisting>
2297           std::tr1::unordered_map&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2298         </programlisting>
2299
2300         <para>
2301           which hashes every combination of client and account-id. This
2302           might work well, except for the fact that it is now impossible
2303           to efficiently list all of the accounts of a specific client
2304           (this would practically require iterating over all
2305           entries). Instead, one can use
2306         </para>
2307
2308         <programlisting>
2309           std::tr1::unordered_multimap&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2310         </programlisting>
2311
2312         <para>
2313           which hashes every client, and decides equivalence based on
2314           client only. This will ensure that all accounts belonging to a
2315           specific user are stored consecutively.
2316         </para>
2317
2318         <para>
2319           Also, suppose one wants an integers' priority queue
2320           (a container that supports <function>push</function>,
2321           <function>pop</function>, and <function>top</function> operations, the last of which
2322           returns the largest <type>int</type>) that also supports
2323           operations such as <function>find</function> and <function>lower_bound</function>. A
2324           reasonable solution is to build an adapter over
2325           <classname>std::set&lt;int&gt;</classname>. In this adapter,
2326           <function>push</function> will just call the tree-based
2327           associative container's <function>insert</function> method; <function>pop</function>
2328           will call its <function>end</function> method, and use it to return the
2329           preceding element (which must be the largest). Then this might
2330           work well, except that the container object cannot hold
2331           multiple instances of the same integer (<function>push(4)</function>,
2332           will be a no-op if <constant>4</constant> is already in the
2333           container object). If multiple keys are necessary, then one
2334           might build the adapter over an
2335           <classname>std::multiset&lt;int&gt;</classname>.
2336         </para>
2337
2338         <para>
2339           The standard library's non-unique-mapping containers are useful
2340           when (1) a key can be decomposed in to a primary key and a
2341           secondary key, (2) a key is needed multiple times, or (3) any
2342           combination of (1) and (2).
2343         </para>
2344
2345         <para>
2346           The graphic below shows how the standard library's container
2347           design works internally; in this figure nodes shaded equally
2348           represent equivalent-key values. Equivalent keys are stored
2349           consecutively using the properties of the underlying data
2350           structure: binary search trees (label A) store equivalent-key
2351           values consecutively (in the sense of an in-order walk)
2352           naturally; collision-chaining hash tables (label B) store
2353           equivalent-key values in the same bucket, the bucket can be
2354           arranged so that equivalent-key values are consecutive.
2355         </para>
2356
2357         <figure>
2358           <title>Non-unique Mapping Standard Containers</title>
2359           <mediaobject>
2360             <imageobject>
2361               <imagedata align="center" format="PNG" scale="100"
2362                          fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_embedded_lists_1.png"/>
2363             </imageobject>
2364             <textobject>
2365               <phrase>Non-unique Mapping Standard Containers</phrase>
2366             </textobject>
2367           </mediaobject>
2368         </figure>
2369
2370         <para>
2371           Put differently, the standards' non-unique mapping
2372           associative-containers are associative containers that map
2373           primary keys to linked lists that are embedded into the
2374           container. The graphic below shows again the two
2375           containers from the first graphic above, this time with
2376           the embedded linked lists of the grayed nodes marked
2377           explicitly.
2378         </para>
2379
2380         <figure xml:id="fig.pbds_embedded_lists_2">
2381           <title>
2382             Effect of embedded lists in
2383             <classname>std::multimap</classname>
2384           </title>
2385           <mediaobject>
2386             <imageobject>
2387               <imagedata align="center" format="PNG" scale="100"
2388                          fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_embedded_lists_2.png"/>
2389             </imageobject>
2390             <textobject>
2391               <phrase>
2392                 Effect of embedded lists in
2393                 <classname>std::multimap</classname>
2394               </phrase>
2395             </textobject>
2396           </mediaobject>
2397         </figure>
2398
2399         <para>
2400           These embedded linked lists have several disadvantages.
2401         </para>
2402
2403         <orderedlist>
2404           <listitem>
2405             <para>
2406               The underlying data structure embeds the linked lists
2407               according to its own consideration, which means that the
2408               search path for a value might include several different
2409               equivalent-key values. For example, the search path for the
2410               the black node in either of the first graphic, labels A or B,
2411               includes more than a single gray node.
2412             </para>
2413           </listitem>
2414
2415           <listitem>
2416             <para>
2417               The links of the linked lists are the underlying data
2418               structures' nodes, which typically are quite structured.  In
2419               the case of tree-based containers (the grapic above, label
2420               B), each "link" is actually a node with three pointers (one
2421               to a parent and two to children), and a
2422               relatively-complicated iteration algorithm. The linked
2423               lists, therefore, can take up quite a lot of memory, and
2424               iterating over all values equal to a given key (through the
2425               return value of the standard
2426               library's <function>equal_range</function>) can be
2427               expensive.
2428             </para>
2429           </listitem>
2430
2431           <listitem>
2432             <para>
2433               The primary key is stored multiply; this uses more memory.
2434             </para>
2435           </listitem>
2436
2437           <listitem>
2438             <para>
2439               Finally, the interface of this design excludes several
2440               useful underlying data structures. Of all the unordered
2441               self-organizing data structures, practically only
2442               collision-chaining hash tables can (efficiently) guarantee
2443               that equivalent-key values are stored consecutively.
2444             </para>
2445           </listitem>
2446         </orderedlist>
2447
2448         <para>
2449           The above reasons hold even when the ratio of secondary keys to
2450           primary keys (or average number of identical keys) is small, but
2451           when it is large, there are more severe problems:
2452         </para>
2453
2454         <orderedlist>
2455           <listitem>
2456             <para>
2457               The underlying data structures order the links inside each
2458               embedded linked-lists according to their internal
2459               considerations, which effectively means that each of the
2460               links is unordered. Irrespective of the underlying data
2461               structure, searching for a specific value can degrade to
2462               linear complexity.
2463             </para>
2464           </listitem>
2465
2466           <listitem>
2467             <para>
2468               Similarly to the above point, it is impossible to apply
2469               to the secondary keys considerations that apply to primary
2470               keys. For example, it is not possible to maintain secondary
2471               keys by sorted order.
2472             </para>
2473           </listitem>
2474
2475           <listitem>
2476             <para>
2477               While the interface "understands" that all equivalent-key
2478               values constitute a distinct list (through
2479               <function>equal_range</function>), the underlying data
2480               structure typically does not. This means that operations such
2481               as erasing from a tree-based container all values whose keys
2482               are equivalent to a a given key can be super-linear in the
2483               size of the tree; this is also true also for several other
2484               operations that target a specific list.
2485             </para>
2486           </listitem>
2487
2488         </orderedlist>
2489
2490         <para>
2491           In this library, all associative containers map
2492           (or store) unique-key values. One can (1) map primary keys to
2493           secondary associative-containers (containers of
2494           secondary keys) or non-associative containers (2) map identical
2495           keys to a size-type representing the number of times they
2496           occur, or (3) any combination of (1) and (2). Instead of
2497           allowing multiple equivalent-key values, this library
2498           supplies associative containers based on underlying
2499           data structures that are suitable as secondary
2500           associative-containers.
2501         </para>
2502
2503         <para>
2504           In the figure below, labels A and B show the equivalent
2505           underlying data structures in this library, as mapped to the
2506           first graphic above. Labels A and B, respectively. Each shaded
2507           box represents some size-type or secondary
2508           associative-container.
2509         </para>
2510
2511         <figure>
2512           <title>Non-unique Mapping Containers</title>
2513           <mediaobject>
2514             <imageobject>
2515               <imagedata align="center" format="PNG" scale="100"
2516                          fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_embedded_lists_3.png"/>
2517             </imageobject>
2518             <textobject>
2519               <phrase>Non-unique Mapping Containers</phrase>
2520             </textobject>
2521           </mediaobject>
2522         </figure>
2523
2524         <para>
2525           In the first example above, then, one would use an associative
2526           container mapping each user to an associative container which
2527           maps each application id to a start time (see
2528           <filename>example/basic_multimap.cc</filename>); in the second
2529           example, one would use an associative container mapping
2530           each <classname>int</classname> to some size-type indicating the
2531           number of times it logically occurs
2532           (see <filename>example/basic_multiset.cc</filename>.
2533         </para>
2534
2535         <para>
2536           See the discussion in list-based container types for containers
2537           especially suited as secondary associative-containers.
2538         </para>
2539       </section>
2540
2541     </section> <!-- map and set semantics -->
2542
2543     <section xml:id="pbds.design.concepts.iterator_semantics">
2544       <info><title>Iterator Semantics</title></info>
2545
2546       <section xml:id="concepts.iterator_semantics.point_and_range">
2547         <info><title>Point and Range Iterators</title></info>
2548
2549         <para>
2550           Iterator concepts are bifurcated in this design, and are
2551           comprised of point-type and range-type iteration.
2552         </para>
2553
2554         <para>
2555           A point-type iterator is an iterator that refers to a specific
2556           element as returned through an
2557           associative-container's <function>find</function> method.
2558         </para>
2559
2560         <para>
2561           A range-type iterator is an iterator that is used to go over a
2562           sequence of elements, as returned by a container's
2563           <function>find</function> method.
2564         </para>
2565
2566         <para>
2567           A point-type method is a method that
2568           returns a point-type iterator; a range-type method is a method
2569           that returns a range-type iterator.
2570         </para>
2571
2572         <para>For most containers, these types are synonymous; for
2573         self-organizing containers, such as hash-based containers or
2574         priority queues, these are inherently different (in any
2575         implementation, including that of C++ standard library
2576         components), but in this design, it is made explicit. They are
2577         distinct types.
2578         </para>
2579       </section>
2580
2581
2582       <section xml:id="concepts.iterator_semantics.both">
2583         <info><title>Distinguishing Point and Range Iterators</title></info>
2584
2585         <para>When using this library, is necessary to differentiate
2586         between two types of methods and iterators: point-type methods and
2587         iterators, and range-type methods and iterators. Each associative
2588         container's interface includes the methods:</para>
2589         <programlisting>
2590           point_const_iterator
2591           find(const_key_reference r_key) const;
2592
2593           point_iterator
2594           find(const_key_reference r_key);
2595
2596           std::pair&lt;point_iterator,bool&gt;
2597           insert(const_reference r_val);
2598         </programlisting>
2599
2600         <para>The relationship between these iterator types varies between
2601         container types. The figure below
2602         shows the most general invariant between point-type and
2603         range-type iterators: In <emphasis>A</emphasis> <literal>iterator</literal>, can
2604         always be converted to <literal>point_iterator</literal>. In <emphasis>B</emphasis>
2605         shows invariants for order-preserving containers: point-type
2606         iterators are synonymous with range-type iterators.
2607         Orthogonally,  <emphasis>C</emphasis>shows invariants for "set"
2608         containers: iterators are synonymous with const iterators.</para>
2609
2610         <figure>
2611           <title>Point Iterator Hierarchy</title>
2612           <mediaobject>
2613             <imageobject>
2614               <imagedata align="center" format="PNG" scale="100"
2615                          fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_point_iterator_hierarchy.png"/>
2616             </imageobject>
2617             <textobject>
2618               <phrase>Point Iterator Hierarchy</phrase>
2619             </textobject>
2620           </mediaobject>
2621         </figure>
2622
2623
2624         <para>Note that point-type iterators in self-organizing containers
2625         (hash-based associative containers) lack movement
2626         operators, such as <literal>operator++</literal> - in fact, this
2627         is the reason why this library differentiates from the standard C++ librarys
2628         design on this point.</para>
2629
2630         <para>Typically, one can determine an iterator's movement
2631         capabilities using
2632         <literal>std::iterator_traits&lt;It&gt;iterator_category</literal>,
2633         which is a <literal>struct</literal> indicating the iterator's
2634         movement capabilities. Unfortunately, none of the standard predefined
2635         categories reflect a pointer's <emphasis>not</emphasis> having any
2636         movement capabilities whatsoever. Consequently,
2637         <literal>pb_ds</literal> adds a type
2638         <literal>trivial_iterator_tag</literal> (whose name is taken from
2639         a concept in C++ standardese, which is the category of iterators
2640         with no movement capabilities.) All other standard C++ library
2641         tags, such as <literal>forward_iterator_tag</literal> retain their
2642         common use.</para>
2643
2644       </section>
2645
2646       <section xml:id="pbds.design.concepts.invalidation">
2647         <info><title>Invalidation Guarantees</title></info>
2648         <para>
2649           If one manipulates a container object, then iterators previously
2650           obtained from it can be invalidated. In some cases a
2651           previously-obtained iterator cannot be de-referenced; in other cases,
2652           the iterator's next or previous element might have changed
2653           unpredictably. This corresponds exactly to the question whether a
2654           point-type or range-type iterator (see previous concept) is valid or
2655           not. In this design, one can query a container (in compile time) about
2656           its invalidation guarantees.
2657         </para>
2658
2659
2660         <para>
2661           Given three different types of associative containers, a modifying
2662           operation (in that example, <function>erase</function>) invalidated
2663           iterators in three different ways: the iterator of one container
2664           remained completely valid - it could be de-referenced and
2665           incremented; the iterator of a different container could not even be
2666           de-referenced; the iterator of the third container could be
2667           de-referenced, but its "next" iterator changed unpredictably.
2668         </para>
2669
2670         <para>
2671           Distinguishing between find and range types allows fine-grained
2672           invalidation guarantees, because these questions correspond exactly
2673           to the question of whether point-type iterators and range-type
2674           iterators are valid. The graphic below shows tags corresponding to
2675           different types of invalidation guarantees.
2676         </para>
2677
2678         <figure>
2679           <title>Invalidation Guarantee Tags Hierarchy</title>
2680           <mediaobject>
2681             <imageobject>
2682               <imagedata align="center" format="PDF" scale="75"
2683                          fileref="../images/pbds_invalidation_tag_hierarchy.pdf"/>
2684             </imageobject>
2685             <imageobject>
2686               <imagedata align="center" format="PNG" scale="100"
2687                          fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_invalidation_tag_hierarchy.png"/>
2688             </imageobject>
2689             <textobject>
2690               <phrase>Invalidation Guarantee Tags Hierarchy</phrase>
2691             </textobject>
2692           </mediaobject>
2693         </figure>
2694
2695         <itemizedlist>
2696           <listitem>
2697             <para>
2698               <classname>basic_invalidation_guarantee</classname>
2699               corresponds to a basic guarantee that a point-type iterator,
2700               a found pointer, or a found reference, remains valid as long
2701               as the container object is not modified.
2702             </para>
2703           </listitem>
2704
2705           <listitem>
2706             <para>
2707               <classname>point_invalidation_guarantee</classname>
2708               corresponds to a guarantee that a point-type iterator, a
2709               found pointer, or a found reference, remains valid even if
2710               the container object is modified.
2711             </para>
2712           </listitem>
2713
2714           <listitem>
2715             <para>
2716               <classname>range_invalidation_guarantee</classname>
2717               corresponds to a guarantee that a range-type iterator remains
2718               valid even if the container object is modified.
2719             </para>
2720           </listitem>
2721         </itemizedlist>
2722
2723         <para>To find the invalidation guarantee of a
2724         container, one can use</para>
2725         <programlisting>
2726           typename container_traits&lt;Cntnr&gt;::invalidation_guarantee
2727         </programlisting>
2728
2729         <para>Note that this hierarchy corresponds to the logic it
2730         represents: if a container has range-invalidation guarantees,
2731         then it must also have find invalidation guarantees;
2732         correspondingly, its invalidation guarantee (in this case
2733         <classname>range_invalidation_guarantee</classname>)
2734         can be cast to its base class (in this case <classname>point_invalidation_guarantee</classname>).
2735         This means that this this hierarchy can be used easily using
2736         standard metaprogramming techniques, by specializing on the
2737         type of <literal>invalidation_guarantee</literal>.</para>
2738
2739         <para>
2740           These types of problems were addressed, in a more general
2741           setting, in <xref linkend="biblio.meyers96more"/> - Item 2. In
2742           our opinion, an invalidation-guarantee hierarchy would solve
2743           these problems in all container types - not just associative
2744           containers.
2745         </para>
2746
2747       </section>
2748     </section> <!-- iterator semantics -->
2749
2750     <section xml:id="pbds.design.concepts.genericity">
2751       <info><title>Genericity</title></info>
2752
2753       <para>
2754         The design attempts to address the following problem of
2755         data-structure genericity. When writing a function manipulating
2756         a generic container object, what is the behavior of the object?
2757         Suppose one writes
2758       </para>
2759       <programlisting>
2760         template&lt;typename Cntnr&gt;
2761         void
2762         some_op_sequence(Cntnr &amp;r_container)
2763         {
2764         ...
2765         }
2766       </programlisting>
2767
2768       <para>
2769         then one needs to address the following questions in the body
2770         of <function>some_op_sequence</function>:
2771       </para>
2772
2773       <itemizedlist>
2774         <listitem>
2775           <para>
2776             Which types and methods does <literal>Cntnr</literal> support?
2777             Containers based on hash tables can be queries for the
2778             hash-functor type and object; this is meaningless for tree-based
2779             containers. Containers based on trees can be split, joined, or
2780             can erase iterators and return the following iterator; this
2781             cannot be done by hash-based containers.
2782           </para>
2783         </listitem>
2784
2785         <listitem>
2786           <para>
2787             What are the exception and invalidation guarantees
2788             of <literal>Cntnr</literal>? A container based on a probing
2789             hash-table invalidates all iterators when it is modified; this
2790             is not the case for containers based on node-based
2791             trees. Containers based on a node-based tree can be split or
2792             joined without exceptions; this is not the case for containers
2793             based on vector-based trees.
2794           </para>
2795         </listitem>
2796
2797         <listitem>
2798           <para>
2799             How does the container maintain its elements? Tree-based and
2800             Trie-based containers store elements by key order; others,
2801             typically, do not. A container based on a splay trees or lists
2802             with update policies "cache" "frequently accessed" elements;
2803             containers based on most other underlying data structures do
2804             not.
2805           </para>
2806         </listitem>
2807         <listitem>
2808           <para>
2809             How does one query a container about characteristics and
2810             capabilities? What is the relationship between two different
2811             data structures, if anything?
2812           </para>
2813         </listitem>
2814       </itemizedlist>
2815
2816       <para>The remainder of this section explains these issues in
2817       detail.</para>
2818
2819
2820       <section xml:id="concepts.genericity.tag">
2821         <info><title>Tag</title></info>
2822         <para>
2823           Tags are very useful for manipulating generic types. For example, if
2824           <literal>It</literal> is an iterator class, then <literal>typename
2825           It::iterator_category</literal> or <literal>typename
2826           std::iterator_traits&lt;It&gt;::iterator_category</literal> will
2827           yield its category, and <literal>typename
2828           std::iterator_traits&lt;It&gt;::value_type</literal> will yield its
2829           value type.
2830         </para>
2831
2832         <para>
2833           This library contains a container tag hierarchy corresponding to the
2834           diagram below.
2835         </para>
2836
2837         <figure>
2838           <title>Container Tag Hierarchy</title>
2839           <mediaobject>
2840             <imageobject>
2841               <imagedata align="center" format="PDF" scale="75"
2842                          fileref="../images/pbds_container_tag_hierarchy.pdf"/>
2843             </imageobject>
2844             <imageobject>
2845               <imagedata align="center" format="PNG" scale="100"
2846                          fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_container_tag_hierarchy.png"/>
2847             </imageobject>
2848             <textobject>
2849               <phrase>Container Tag Hierarchy</phrase>
2850             </textobject>
2851           </mediaobject>
2852         </figure>
2853
2854         <para>
2855           Given any container <type>Cntnr</type>, the tag of
2856           the underlying data structure can be found via <literal>typename
2857           Cntnr::container_category</literal>.
2858         </para>
2859
2860       </section> <!-- tag -->
2861
2862       <section xml:id="concepts.genericity.traits">
2863         <info><title>Traits</title></info>
2864         <para></para>
2865
2866         <para>Additionally, a traits mechanism can be used to query a
2867         container type for its attributes. Given any container
2868         <literal>Cntnr</literal>, then <literal>&lt;Cntnr&gt;</literal>
2869         is a traits class identifying the properties of the
2870         container.</para>
2871
2872         <para>To find if a container can throw when a key is erased (which
2873         is true for vector-based trees, for example), one can
2874         use
2875         </para>
2876         <programlisting>container_traits&lt;Cntnr&gt;::erase_can_throw</programlisting>
2877
2878         <para>
2879           Some of the definitions in <classname>container_traits</classname>
2880           are dependent on other
2881           definitions. If <classname>container_traits&lt;Cntnr&gt;::order_preserving</classname>
2882           is <constant>true</constant> (which is the case for containers
2883           based on trees and tries), then the container can be split or
2884           joined; in this
2885           case, <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2886           indicates whether splits or joins can throw exceptions (which is
2887           true for vector-based trees);
2888           otherwise <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2889           will yield a compilation error. (This is somewhat similar to a
2890           compile-time version of the COM model).
2891         </para>
2892
2893       </section> <!-- traits -->
2894
2895     </section> <!-- genericity -->
2896   </section> <!-- concepts -->
2897
2898   <section xml:id="pbds.design.container">
2899     <info><title>By Container</title></info>
2900
2901     <!-- hash -->
2902     <section xml:id="pbds.design.container.hash">
2903       <info><title>hash</title></info>
2904
2905       <!--
2906
2907 // hash policies
2908 /// general terms / background
2909 /// range hashing policies
2910 /// ranged-hash policies
2911 /// implementation
2912
2913 // resize policies
2914 /// general
2915 /// size policies
2916 /// trigger policies
2917 /// implementation
2918
2919 // policy interactions
2920 /// probe/size/trigger
2921 /// hash/trigger
2922 /// eq/hash/storing hash values
2923 /// size/load-check trigger
2924       -->
2925       <section xml:id="container.hash.interface">
2926         <info><title>Interface</title></info>
2927
2928
2929
2930         <para>
2931           The collision-chaining hash-based container has the
2932         following declaration.</para>
2933         <programlisting>
2934           template&lt;
2935           typename Key,
2936           typename Mapped,
2937           typename Hash_Fn = std::hash&lt;Key&gt;,
2938           typename Eq_Fn = std::equal_to&lt;Key&gt;,
2939           typename Comb_Hash_Fn =  direct_mask_range_hashing&lt;&gt;
2940           typename Resize_Policy = default explained below.
2941           bool Store_Hash = false,
2942           typename Allocator = std::allocator&lt;char&gt; &gt;
2943           class cc_hash_table;
2944         </programlisting>
2945
2946         <para>The parameters have the following meaning:</para>
2947
2948         <orderedlist>
2949           <listitem><para><classname>Key</classname> is the key type.</para></listitem>
2950
2951           <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
2952
2953           <listitem><para><classname>Hash_Fn</classname> is a key hashing functor.</para></listitem>
2954
2955           <listitem><para><classname>Eq_Fn</classname> is a key equivalence functor.</para></listitem>
2956
2957           <listitem><para><classname>Comb_Hash_Fn</classname> is a range-hashing_functor;
2958           it describes how to translate hash values into positions
2959           within the table. </para></listitem>
2960
2961           <listitem><para><classname>Resize_Policy</classname> describes how a container object
2962           should change its internal size. </para></listitem>
2963
2964           <listitem><para><classname>Store_Hash</classname> indicates whether the hash value
2965           should be stored with each entry. </para></listitem>
2966
2967           <listitem><para><classname>Allocator</classname> is an allocator
2968           type.</para></listitem>
2969         </orderedlist>
2970
2971         <para>The probing hash-based container has the following
2972         declaration.</para>
2973         <programlisting>
2974           template&lt;
2975           typename Key,
2976           typename Mapped,
2977           typename Hash_Fn = std::hash&lt;Key&gt;,
2978           typename Eq_Fn = std::equal_to&lt;Key&gt;,
2979           typename Comb_Probe_Fn = direct_mask_range_hashing&lt;&gt;
2980           typename Probe_Fn = default explained below.
2981           typename Resize_Policy = default explained below.
2982           bool Store_Hash = false,
2983           typename Allocator =  std::allocator&lt;char&gt; &gt;
2984           class gp_hash_table;
2985         </programlisting>
2986
2987         <para>The parameters are identical to those of the
2988         collision-chaining container, except for the following.</para>
2989
2990         <orderedlist>
2991           <listitem><para><classname>Comb_Probe_Fn</classname> describes how to transform a probe
2992           sequence into a sequence of positions within the table.</para></listitem>
2993
2994           <listitem><para><classname>Probe_Fn</classname> describes a probe sequence policy.</para></listitem>
2995         </orderedlist>
2996
2997         <para>Some of the default template values depend on the values of
2998         other parameters, and are explained below.</para>
2999
3000       </section>
3001       <section xml:id="container.hash.details">
3002         <info><title>Details</title></info>
3003
3004         <section xml:id="container.hash.details.hash_policies">
3005           <info><title>Hash Policies</title></info>
3006
3007           <section xml:id="details.hash_policies.general">
3008             <info><title>General</title></info>
3009
3010             <para>Following is an explanation of some functions which hashing
3011             involves. The graphic below illustrates the discussion.</para>
3012
3013             <figure>
3014               <title>Hash functions, ranged-hash functions, and
3015               range-hashing functions</title>
3016               <mediaobject>
3017                 <imageobject>
3018                   <imagedata align="center" format="PNG" scale="100"
3019                              fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_ranged_hash_range_hashing_fns.png"/>
3020                 </imageobject>
3021                 <textobject>
3022                   <phrase>Hash functions, ranged-hash functions, and
3023                   range-hashing functions</phrase>
3024                 </textobject>
3025               </mediaobject>
3026             </figure>
3027
3028             <para>Let U be a domain (e.g., the integers, or the
3029             strings of 3 characters). A hash-table algorithm needs to map
3030             elements of U "uniformly" into the range [0,..., m -
3031             1] (where m is a non-negative integral value, and
3032             is, in general, time varying). I.e., the algorithm needs
3033             a ranged-hash function</para>
3034
3035             <para>
3036               f : U × Z<subscript>+</subscript> → Z<subscript>+</subscript>
3037             </para>
3038
3039             <para>such that for any u in U ,</para>
3040
3041             <para>0 ≤ f(u, m) ≤ m - 1</para>
3042
3043             <para>and which has "good uniformity" properties (say
3044             <xref linkend="biblio.knuth98sorting"/>.)
3045             One
3046             common solution is to use the composition of the hash
3047             function</para>
3048
3049             <para>h : U → Z<subscript>+</subscript> ,</para>
3050
3051             <para>which maps elements of U into the non-negative
3052             integrals, and</para>
3053
3054             <para>g : Z<subscript>+</subscript> × Z<subscript>+</subscript> →
3055             Z<subscript>+</subscript>,</para>
3056
3057             <para>which maps a non-negative hash value, and a non-negative
3058             range upper-bound into a non-negative integral in the range
3059             between 0 (inclusive) and the range upper bound (exclusive),
3060             i.e., for any r in Z<subscript>+</subscript>,</para>
3061
3062             <para>0 ≤ g(r, m) ≤ m - 1</para>
3063
3064
3065             <para>The resulting ranged-hash function, is</para>
3066
3067             <!-- ranged_hash_composed_of_hash_and_range_hashing -->
3068             <equation>
3069               <title>Ranged Hash Function</title>
3070               <mathphrase>
3071                 f(u , m) = g(h(u), m)
3072               </mathphrase>
3073             </equation>
3074
3075             <para>From the above, it is obvious that given g and
3076             h, f can always be composed (however the converse
3077             is not true). The standard's hash-based containers allow specifying
3078             a hash function, and use a hard-wired range-hashing function;
3079             the ranged-hash function is implicitly composed.</para>
3080
3081             <para>The above describes the case where a key is to be mapped
3082             into a single position within a hash table, e.g.,
3083             in a collision-chaining table. In other cases, a key is to be
3084             mapped into a sequence of positions within a table,
3085             e.g., in a probing table. Similar terms apply in this
3086             case: the table requires a ranged probe function,
3087             mapping a key into a sequence of positions withing the table.
3088             This is typically achieved by composing a hash function
3089             mapping the key into a non-negative integral type, a
3090             probe function transforming the hash value into a
3091             sequence of hash values, and a range-hashing function
3092             transforming the sequence of hash values into a sequence of
3093             positions.</para>
3094
3095           </section>
3096
3097           <section xml:id="details.hash_policies.range">
3098             <info><title>Range Hashing</title></info>
3099
3100             <para>Some common choices for range-hashing functions are the
3101             division, multiplication, and middle-square methods (<xref linkend="biblio.knuth98sorting"/>), defined
3102             as</para>
3103
3104             <equation>
3105               <title>Range-Hashing, Division Method</title>
3106               <mathphrase>
3107                 g(r, m) = r mod m
3108               </mathphrase>
3109             </equation>
3110
3111
3112
3113             <para>g(r, m) = ⌈ u/v ( a r mod v ) ⌉</para>
3114
3115             <para>and</para>
3116
3117             <para>g(r, m) = ⌈ u/v ( r<superscript>2</superscript> mod v ) ⌉</para>
3118
3119             <para>respectively, for some positive integrals u and
3120             v (typically powers of 2), and some a. Each of
3121             these range-hashing functions works best for some different
3122             setting.</para>
3123
3124             <para>The division method (see above) is a
3125             very common choice. However, even this single method can be
3126             implemented in two very different ways. It is possible to
3127             implement using the low
3128             level % (modulo) operation (for any m), or the
3129             low level &amp; (bit-mask) operation (for the case where
3130             m is a power of 2), i.e.,</para>
3131
3132             <equation>
3133               <title>Division via Prime Modulo</title>
3134               <mathphrase>
3135                 g(r, m) = r % m
3136               </mathphrase>
3137             </equation>
3138
3139             <para>and</para>
3140
3141             <equation>
3142               <title>Division via Bit Mask</title>
3143               <mathphrase>
3144                 g(r, m) = r &amp; m - 1, (with m =
3145                 2<superscript>k</superscript> for some k)
3146               </mathphrase>
3147             </equation>
3148
3149
3150             <para>respectively.</para>
3151
3152             <para>The % (modulo) implementation has the advantage that for
3153             m a prime far from a power of 2, g(r, m) is
3154             affected by all the bits of r (minimizing the chance of
3155             collision). It has the disadvantage of using the costly modulo
3156             operation. This method is hard-wired into SGI's implementation
3157             .</para>
3158
3159             <para>The &amp; (bit-mask) implementation has the advantage of
3160             relying on the fast bit-wise and operation. It has the
3161             disadvantage that for g(r, m) is affected only by the
3162             low order bits of r. This method is hard-wired into
3163             Dinkumware's implementation.</para>
3164
3165
3166           </section>
3167
3168           <section xml:id="details.hash_policies.ranged">
3169             <info><title>Ranged Hash</title></info>
3170
3171             <para>In cases it is beneficial to allow the
3172             client to directly specify a ranged-hash hash function. It is
3173             true, that the writer of the ranged-hash function cannot rely
3174             on the values of m having specific numerical properties
3175             suitable for hashing (in the sense used in <xref linkend="biblio.knuth98sorting"/>), since
3176             the values of m are determined by a resize policy with
3177             possibly orthogonal considerations.</para>
3178
3179             <para>There are two cases where a ranged-hash function can be
3180             superior. The firs is when using perfect hashing: the
3181             second is when the values of m can be used to estimate
3182             the "general" number of distinct values required. This is
3183             described in the following.</para>
3184
3185             <para>Let</para>
3186
3187             <para>
3188               s = [ s<subscript>0</subscript>,..., s<subscript>t - 1</subscript>]
3189             </para>
3190
3191             <para>be a string of t characters, each of which is from
3192             domain S. Consider the following ranged-hash
3193             function:</para>
3194             <equation>
3195               <title>
3196                 A Standard String Hash Function
3197               </title>
3198               <mathphrase>
3199                 f<subscript>1</subscript>(s, m) = ∑ <subscript>i =
3200                 0</subscript><superscript>t - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3201               </mathphrase>
3202             </equation>
3203
3204
3205             <para>where a is some non-negative integral value. This is
3206             the standard string-hashing function used in SGI's
3207             implementation (with a = 5). Its advantage is that
3208             it takes into account all of the characters of the string.</para>
3209
3210             <para>Now assume that s is the string representation of a
3211             of a long DNA sequence (and so S = {'A', 'C', 'G',
3212             'T'}). In this case, scanning the entire string might be
3213             prohibitively expensive. A possible alternative might be to use
3214             only the first k characters of the string, where</para>
3215
3216             <para>|S|<superscript>k</superscript> ≥ m ,</para>
3217
3218             <para>i.e., using the hash function</para>
3219
3220             <equation>
3221               <title>
3222                 Only k String DNA Hash
3223               </title>
3224               <mathphrase>
3225                 f<subscript>2</subscript>(s, m) = ∑ <subscript>i
3226                 = 0</subscript><superscript>k - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3227               </mathphrase>
3228             </equation>
3229
3230             <para>requiring scanning over only</para>
3231
3232             <para>k = log<subscript>4</subscript>( m )</para>
3233
3234             <para>characters.</para>
3235
3236             <para>Other more elaborate hash-functions might scan k
3237             characters starting at a random position (determined at each
3238             resize), or scanning k random positions (determined at
3239             each resize), i.e., using</para>
3240
3241             <para>f<subscript>3</subscript>(s, m) = ∑ <subscript>i =
3242             r</subscript>0<superscript>r<subscript>0</subscript> + k - 1</superscript> s<subscript>i</subscript>
3243             a<superscript>i</superscript> mod m ,</para>
3244
3245             <para>or</para>
3246
3247             <para>f<subscript>4</subscript>(s, m) = ∑ <subscript>i = 0</subscript><superscript>k -
3248             1</superscript> s<subscript>r</subscript>i a<superscript>r<subscript>i</subscript></superscript> mod
3249             m ,</para>
3250
3251             <para>respectively, for r<subscript>0</subscript>,..., r<subscript>k-1</subscript>
3252             each in the (inclusive) range [0,...,t-1].</para>
3253
3254             <para>It should be noted that the above functions cannot be
3255             decomposed as per a ranged hash composed of hash and range hashing.</para>
3256
3257
3258           </section>
3259
3260           <section xml:id="details.hash_policies.implementation">
3261             <info><title>Implementation</title></info>
3262
3263             <para>This sub-subsection describes the implementation of
3264             the above in this library. It first explains range-hashing
3265             functions in collision-chaining tables, then ranged-hash
3266             functions in collision-chaining tables, then probing-based
3267             tables, and finally lists the relevant classes in this
3268             library.</para>
3269
3270             <section xml:id="hash_policies.implementation.collision-chaining">
3271               <info><title>
3272                 Range-Hashing and Ranged-Hashes in Collision-Chaining Tables
3273               </title></info>
3274
3275
3276               <para><classname>cc_hash_table</classname> is
3277               parametrized by <classname>Hash_Fn</classname> and <classname>Comb_Hash_Fn</classname>, a
3278               hash functor and a combining hash functor, respectively.</para>
3279
3280               <para>In general, <classname>Comb_Hash_Fn</classname> is considered a
3281               range-hashing functor. <classname>cc_hash_table</classname>
3282               synthesizes a ranged-hash function from <classname>Hash_Fn</classname> and
3283               <classname>Comb_Hash_Fn</classname>. The figure below shows an <classname>insert</classname> sequence
3284               diagram for this case. The user inserts an element (point A),
3285               the container transforms the key into a non-negative integral
3286               using the hash functor (points B and C), and transforms the
3287               result into a position using the combining functor (points D
3288               and E).</para>
3289
3290               <figure>
3291                 <title>Insert hash sequence diagram</title>
3292                 <mediaobject>
3293                   <imageobject>
3294                     <imagedata align="center" format="PNG" scale="100"
3295                                fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_range_hashing_seq_diagram.png"/>
3296                   </imageobject>
3297                   <textobject>
3298                     <phrase>Insert hash sequence diagram</phrase>
3299                   </textobject>
3300                 </mediaobject>
3301               </figure>
3302
3303               <para>If <classname>cc_hash_table</classname>'s
3304               hash-functor, <classname>Hash_Fn</classname> is instantiated by <classname>null_type</classname> , then <classname>Comb_Hash_Fn</classname> is taken to be
3305               a ranged-hash function. The graphic below shows an <function>insert</function> sequence
3306               diagram. The user inserts an element (point A), the container
3307               transforms the key into a position using the combining functor
3308               (points B and C).</para>
3309
3310               <figure>
3311                 <title>Insert hash sequence diagram with a null policy</title>
3312                 <mediaobject>
3313                   <imageobject>
3314                     <imagedata align="center" format="PNG" scale="100"
3315                                fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_range_hashing_seq_diagram2.png"/>
3316                   </imageobject>
3317                   <textobject>
3318                     <phrase>Insert hash sequence diagram with a null policy</phrase>
3319                   </textobject>
3320                 </mediaobject>
3321               </figure>
3322
3323             </section>
3324
3325             <section xml:id="hash_policies.implementation.probe">
3326               <info><title>
3327                 Probing tables
3328               </title></info>
3329               <para><classname>gp_hash_table</classname> is parametrized by
3330               <classname>Hash_Fn</classname>, <classname>Probe_Fn</classname>,
3331               and <classname>Comb_Probe_Fn</classname>. As before, if
3332               <classname>Hash_Fn</classname> and <classname>Probe_Fn</classname>
3333               are both <classname>null_type</classname>, then
3334               <classname>Comb_Probe_Fn</classname> is a ranged-probe
3335               functor. Otherwise, <classname>Hash_Fn</classname> is a hash
3336               functor, <classname>Probe_Fn</classname> is a functor for offsets
3337               from a hash value, and <classname>Comb_Probe_Fn</classname>
3338               transforms a probe sequence into a sequence of positions within
3339               the table.</para>
3340
3341             </section>
3342
3343             <section xml:id="hash_policies.implementation.predefined">
3344               <info><title>
3345                 Pre-Defined Policies
3346               </title></info>
3347
3348               <para>This library contains some pre-defined classes
3349               implementing range-hashing and probing functions:</para>
3350
3351               <orderedlist>
3352                 <listitem><para><classname>direct_mask_range_hashing</classname>
3353                 and <classname>direct_mod_range_hashing</classname>
3354                 are range-hashing functions based on a bit-mask and a modulo
3355                 operation, respectively.</para></listitem>
3356
3357                 <listitem><para><classname>linear_probe_fn</classname>, and
3358                 <classname>quadratic_probe_fn</classname> are
3359                 a linear probe and a quadratic probe function,
3360                 respectively.</para></listitem>
3361               </orderedlist>
3362
3363               <para>
3364                 The graphic below shows the relationships.
3365               </para>
3366               <figure>
3367                 <title>Hash policy class diagram</title>
3368                 <mediaobject>
3369                   <imageobject>
3370                     <imagedata align="center" format="PNG" scale="100"
3371                                fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_hash_policy_cd.png"/>
3372                   </imageobject>
3373                   <textobject>
3374                     <phrase>Hash policy class diagram</phrase>
3375                   </textobject>
3376                 </mediaobject>
3377               </figure>
3378
3379
3380             </section>
3381
3382           </section> <!-- impl -->
3383
3384         </section>
3385
3386         <section xml:id="container.hash.details.resize_policies">
3387           <info><title>Resize Policies</title></info>
3388
3389           <section xml:id="resize_policies.general">
3390             <info><title>General</title></info>
3391
3392             <para>Hash-tables, as opposed to trees, do not naturally grow or
3393             shrink. It is necessary to specify policies to determine how
3394             and when a hash table should change its size. Usually, resize
3395             policies can be decomposed into orthogonal policies:</para>
3396
3397             <orderedlist>
3398               <listitem><para>A size policy indicating how a hash table
3399               should grow (e.g., it should multiply by powers of
3400               2).</para></listitem>
3401
3402               <listitem><para>A trigger policy indicating when a hash
3403               table should grow (e.g., a load factor is
3404               exceeded).</para></listitem>
3405             </orderedlist>
3406
3407           </section>
3408
3409           <section xml:id="resize_policies.size">
3410             <info><title>Size Policies</title></info>
3411
3412
3413             <para>Size policies determine how a hash table changes size. These
3414             policies are simple, and there are relatively few sensible
3415             options. An exponential-size policy (with the initial size and
3416             growth factors both powers of 2) works well with a mask-based
3417             range-hashing function, and is the
3418             hard-wired policy used by Dinkumware. A
3419             prime-list based policy works well with a modulo-prime range
3420             hashing function and is the hard-wired policy used by SGI's
3421             implementation.</para>
3422
3423           </section>
3424
3425           <section xml:id="resize_policies.trigger">
3426             <info><title>Trigger Policies</title></info>
3427
3428             <para>Trigger policies determine when a hash table changes size.
3429             Following is a description of two policies: load-check
3430             policies, and collision-check policies.</para>
3431
3432             <para>Load-check policies are straightforward. The user specifies
3433             two factors, Α<subscript>min</subscript> and
3434             Α<subscript>max</subscript>, and the hash table maintains the
3435             invariant that</para>
3436
3437             <para>Α<subscript>min</subscript> ≤ (number of
3438             stored elements) / (hash-table size) ≤
3439             Α<subscript>max</subscript><remark>load factor min max</remark></para>
3440
3441             <para>Collision-check policies work in the opposite direction of
3442             load-check policies. They focus on keeping the number of
3443             collisions moderate and hoping that the size of the table will
3444             not grow very large, instead of keeping a moderate load-factor
3445             and hoping that the number of collisions will be small. A
3446             maximal collision-check policy resizes when the longest
3447             probe-sequence grows too large.</para>
3448
3449             <para>Consider the graphic below. Let the size of the hash table
3450             be denoted by m, the length of a probe sequence be denoted by k,
3451             and some load factor be denoted by Α. We would like to
3452             calculate the minimal length of k, such that if there were Α
3453             m elements in the hash table, a probe sequence of length k would
3454             be found with probability at most 1/m.</para>
3455
3456             <figure>
3457               <title>Balls and bins</title>
3458               <mediaobject>
3459                 <imageobject>
3460                   <imagedata align="center" format="PNG" scale="100"
3461                              fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_balls_and_bins.png"/>
3462                 </imageobject>
3463                 <textobject>
3464                   <phrase>Balls and bins</phrase>
3465                 </textobject>
3466               </mediaobject>
3467             </figure>
3468
3469             <para>Denote the probability that a probe sequence of length
3470             k appears in bin i by p<subscript>i</subscript>, the
3471             length of the probe sequence of bin i by
3472             l<subscript>i</subscript>, and assume uniform distribution. Then</para>
3473
3474
3475
3476             <equation>
3477               <title>
3478                 Probability of Probe Sequence of Length k
3479               </title>
3480               <mathphrase>
3481                 p<subscript>1</subscript> =
3482               </mathphrase>
3483             </equation>
3484
3485             <para>P(l<subscript>1</subscript> ≥ k) =</para>
3486
3487             <para>
3488               P(l<subscript>1</subscript> ≥ α ( 1 + k / α - 1) ≤ (a)
3489             </para>
3490
3491             <para>
3492               e ^ ( - ( α ( k / α - 1 )<superscript>2</superscript> ) /2)
3493             </para>
3494
3495             <para>where (a) follows from the Chernoff bound (<xref linkend="biblio.motwani95random"/>). To
3496             calculate the probability that some bin contains a probe
3497             sequence greater than k, we note that the
3498             l<subscript>i</subscript> are negatively-dependent
3499             (<xref linkend="biblio.dubhashi98neg"/>)
3500             . Let
3501             I(.) denote the indicator function. Then</para>
3502
3503             <equation>
3504               <title>
3505                 Probability Probe Sequence in Some Bin
3506               </title>
3507               <mathphrase>
3508                 P( exists<subscript>i</subscript> l<subscript>i</subscript> ≥ k ) =
3509               </mathphrase>
3510             </equation>
3511
3512             <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript>
3513             I(l<subscript>i</subscript> ≥ k) ≥ 1 ) =</para>
3514
3515             <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript> I (
3516             l<subscript>i</subscript> ≥ k ) ≥ m p<subscript>1</subscript> ( 1 + 1 / (m
3517             p<subscript>1</subscript>) - 1 ) ) ≤ (a)</para>
3518
3519             <para>e ^ ( ( - m p<subscript>1</subscript> ( 1 / (m p<subscript>1</subscript>)
3520             - 1 ) <superscript>2</superscript> ) / 2 ) ,</para>
3521
3522             <para>where (a) follows from the fact that the Chernoff bound can
3523             be applied to negatively-dependent variables (<xref
3524             linkend="biblio.dubhashi98neg"/>). Inserting the first probability
3525             equation into the second one, and equating with 1/m, we
3526             obtain</para>
3527
3528
3529             <para>k ~ √ ( 2 α ln 2 m ln(m) )
3530             ) .</para>
3531
3532           </section>
3533
3534           <section xml:id="resize_policies.impl">
3535             <info><title>Implementation</title></info>
3536
3537             <para>This sub-subsection describes the implementation of the
3538             above in this library. It first describes resize policies and
3539             their decomposition into trigger and size policies, then
3540             describes pre-defined classes, and finally discusses controlled
3541             access the policies' internals.</para>
3542
3543             <section xml:id="resize_policies.impl.decomposition">
3544               <info><title>Decomposition</title></info>
3545
3546
3547               <para>Each hash-based container is parametrized by a
3548               <classname>Resize_Policy</classname> parameter; the container derives
3549               <classname>public</classname>ly from <classname>Resize_Policy</classname>. For
3550               example:</para>
3551               <programlisting>
3552                 cc_hash_table&lt;typename Key,
3553                 typename Mapped,
3554                 ...
3555                 typename Resize_Policy
3556                 ...&gt; : public Resize_Policy
3557               </programlisting>
3558
3559               <para>As a container object is modified, it continuously notifies
3560               its <classname>Resize_Policy</classname> base of internal changes
3561               (e.g., collisions encountered and elements being
3562               inserted). It queries its <classname>Resize_Policy</classname> base whether
3563               it needs to be resized, and if so, to what size.</para>
3564
3565               <para>The graphic below shows a (possible) sequence diagram
3566               of an insert operation. The user inserts an element; the hash
3567               table notifies its resize policy that a search has started
3568               (point A); in this case, a single collision is encountered -
3569               the table notifies its resize policy of this (point B); the
3570               container finally notifies its resize policy that the search
3571               has ended (point C); it then queries its resize policy whether
3572               a resize is needed, and if so, what is the new size (points D
3573               to G); following the resize, it notifies the policy that a
3574               resize has completed (point H); finally, the element is
3575               inserted, and the policy notified (point I).</para>
3576
3577               <figure>
3578                 <title>Insert resize sequence diagram</title>
3579                 <mediaobject>
3580                   <imageobject>
3581                     <imagedata align="center" format="PNG" scale="100"
3582                                fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_insert_resize_sequence_diagram1.png"/>
3583                   </imageobject>
3584                   <textobject>
3585                     <phrase>Insert resize sequence diagram</phrase>
3586                   </textobject>
3587                 </mediaobject>
3588               </figure>
3589
3590
3591               <para>In practice, a resize policy can be usually orthogonally
3592               decomposed to a size policy and a trigger policy. Consequently,
3593               the library contains a single class for instantiating a resize
3594               policy: <classname>hash_standard_resize_policy</classname>
3595               is parametrized by <classname>Size_Policy</classname> and
3596               <classname>Trigger_Policy</classname>, derives <classname>public</classname>ly from
3597               both, and acts as a standard delegate (<xref linkend="biblio.gof"/>)
3598               to these policies.</para>
3599
3600               <para>The two graphics immediately below show sequence diagrams
3601               illustrating the interaction between the standard resize policy
3602               and its trigger and size policies, respectively.</para>
3603
3604               <figure>
3605                 <title>Standard resize policy trigger sequence
3606                 diagram</title>
3607                 <mediaobject>
3608                   <imageobject>
3609                     <imagedata align="center" format="PNG" scale="100"
3610                                fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_insert_resize_sequence_diagram2.png"/>
3611                   </imageobject>
3612                   <textobject>
3613                     <phrase>Standard resize policy trigger sequence
3614                     diagram</phrase>
3615                   </textobject>
3616                 </mediaobject>
3617               </figure>
3618
3619               <figure>
3620                 <title>Standard resize policy size sequence
3621                 diagram</title>
3622                 <mediaobject>
3623                   <imageobject>
3624                     <imagedata align="center" format="PNG" scale="100"
3625                                fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_insert_resize_sequence_diagram3.png"/>
3626                   </imageobject>
3627                   <textobject>
3628                     <phrase>Standard resize policy size sequence
3629                     diagram</phrase>
3630                   </textobject>
3631                 </mediaobject>
3632               </figure>
3633
3634
3635             </section>
3636
3637             <section xml:id="resize_policies.impl.predefined">
3638               <info><title>Predefined Policies</title></info>
3639               <para>The library includes the following
3640               instantiations of size and trigger policies:</para>
3641
3642               <orderedlist>
3643                 <listitem><para><classname>hash_load_check_resize_trigger</classname>
3644                 implements a load check trigger policy.</para></listitem>
3645
3646                 <listitem><para><classname>cc_hash_max_collision_check_resize_trigger</classname>
3647                 implements a collision check trigger policy.</para></listitem>
3648
3649                 <listitem><para><classname>hash_exponential_size_policy</classname>
3650                 implements an exponential-size policy (which should be used
3651                 with mask range hashing).</para></listitem>
3652
3653                 <listitem><para><classname>hash_prime_size_policy</classname>
3654                 implementing a size policy based on a sequence of primes
3655                 (which should
3656                 be used with mod range hashing</para></listitem>
3657               </orderedlist>
3658
3659               <para>The graphic below gives an overall picture of the resize-related
3660               classes. <classname>basic_hash_table</classname>
3661               is parametrized by <classname>Resize_Policy</classname>, which it subclasses
3662               publicly. This class is currently instantiated only by <classname>hash_standard_resize_policy</classname>.
3663               <classname>hash_standard_resize_policy</classname>
3664               itself is parametrized by <classname>Trigger_Policy</classname> and
3665               <classname>Size_Policy</classname>. Currently, <classname>Trigger_Policy</classname> is
3666               instantiated by <classname>hash_load_check_resize_trigger</classname>,
3667               or <classname>cc_hash_max_collision_check_resize_trigger</classname>;
3668               <classname>Size_Policy</classname> is instantiated by <classname>hash_exponential_size_policy</classname>,
3669               or <classname>hash_prime_size_policy</classname>.</para>
3670
3671             </section>
3672
3673             <section xml:id="resize_policies.impl.internals">
3674               <info><title>Controling Access to Internals</title></info>
3675
3676               <para>There are cases where (controlled) access to resize
3677               policies' internals is beneficial. E.g., it is sometimes
3678               useful to query a hash-table for the table's actual size (as
3679               opposed to its <function>size()</function> - the number of values it
3680               currently holds); it is sometimes useful to set a table's
3681               initial size, externally resize it, or change load factors.</para>
3682
3683               <para>Clearly, supporting such methods both decreases the
3684               encapsulation of hash-based containers, and increases the
3685               diversity between different associative-containers' interfaces.
3686               Conversely, omitting such methods can decrease containers'
3687               flexibility.</para>
3688
3689               <para>In order to avoid, to the extent possible, the above
3690               conflict, the hash-based containers themselves do not address
3691               any of these questions; this is deferred to the resize policies,
3692               which are easier to change or replace. Thus, for example,
3693               neither <classname>cc_hash_table</classname> nor
3694               <classname>gp_hash_table</classname>
3695               contain methods for querying the actual size of the table; this
3696               is deferred to <classname>hash_standard_resize_policy</classname>.</para>
3697
3698               <para>Furthermore, the policies themselves are parametrized by
3699               template arguments that determine the methods they support
3700               (
3701               <xref linkend="biblio.alexandrescu01modern"/>
3702               shows techniques for doing so). <classname>hash_standard_resize_policy</classname>
3703               is parametrized by <classname>External_Size_Access</classname> that
3704               determines whether it supports methods for querying the actual
3705               size of the table or resizing it. <classname>hash_load_check_resize_trigger</classname>
3706               is parametrized by <classname>External_Load_Access</classname> that
3707               determines whether it supports methods for querying or
3708               modifying the loads. <classname>cc_hash_max_collision_check_resize_trigger</classname>
3709               is parametrized by <classname>External_Load_Access</classname> that
3710               determines whether it supports methods for querying the
3711               load.</para>
3712
3713               <para>Some operations, for example, resizing a container at
3714               run time, or changing the load factors of a load-check trigger
3715               policy, require the container itself to resize. As mentioned
3716               above, the hash-based containers themselves do not contain
3717               these types of methods, only their resize policies.
3718               Consequently, there must be some mechanism for a resize policy
3719               to manipulate the hash-based container. As the hash-based
3720               container is a subclass of the resize policy, this is done
3721               through virtual methods. Each hash-based container has a
3722               <classname>private</classname> <classname>virtual</classname> method:</para>
3723               <programlisting>
3724                 virtual void
3725                 do_resize
3726                 (size_type new_size);
3727               </programlisting>
3728
3729               <para>which resizes the container. Implementations of
3730               <classname>Resize_Policy</classname> can export public methods for resizing
3731               the container externally; these methods internally call
3732               <classname>do_resize</classname> to resize the table.</para>
3733
3734
3735             </section>
3736
3737           </section>
3738
3739
3740         </section> <!-- resize policies -->
3741
3742         <section xml:id="container.hash.details.policy_interaction">
3743           <info><title>Policy Interactions</title></info>
3744           <para>
3745           </para>
3746           <para>Hash-tables are unfortunately especially susceptible to
3747           choice of policies. One of the more complicated aspects of this
3748           is that poor combinations of good policies can form a poor
3749           container. Following are some considerations.</para>
3750
3751           <section xml:id="policy_interaction.probesizetrigger">
3752             <info><title>probe/size/trigger</title></info>
3753
3754             <para>Some combinations do not work well for probing containers.
3755             For example, combining a quadratic probe policy with an
3756             exponential size policy can yield a poor container: when an
3757             element is inserted, a trigger policy might decide that there
3758             is no need to resize, as the table still contains unused
3759             entries; the probe sequence, however, might never reach any of
3760             the unused entries.</para>
3761
3762             <para>Unfortunately, this library cannot detect such problems at
3763             compilation (they are halting reducible). It therefore defines
3764             an exception class <classname>insert_error</classname> to throw an
3765             exception in this case.</para>
3766
3767           </section>
3768
3769           <section xml:id="policy_interaction.hashtrigger">
3770             <info><title>hash/trigger</title></info>
3771
3772             <para>Some trigger policies are especially susceptible to poor
3773             hash functions. Suppose, as an extreme case, that the hash
3774             function transforms each key to the same hash value. After some
3775             inserts, a collision detecting policy will always indicate that
3776             the container needs to grow.</para>
3777
3778             <para>The library, therefore, by design, limits each operation to
3779             one resize. For each <classname>insert</classname>, for example, it queries
3780             only once whether a resize is needed.</para>
3781
3782           </section>
3783
3784           <section xml:id="policy_interaction.eqstorehash">
3785             <info><title>equivalence functors/storing hash values/hash</title></info>
3786
3787             <para><classname>cc_hash_table</classname> and
3788             <classname>gp_hash_table</classname> are
3789             parametrized by an equivalence functor and by a
3790             <classname>Store_Hash</classname> parameter. If the latter parameter is
3791             <classname>true</classname>, then the container stores with each entry
3792             a hash value, and uses this value in case of collisions to
3793             determine whether to apply a hash value. This can lower the
3794             cost of collision for some types, but increase the cost of
3795             collisions for other types.</para>
3796
3797             <para>If a ranged-hash function or ranged probe function is
3798             directly supplied, however, then it makes no sense to store the
3799             hash value with each entry. This library's container will
3800             fail at compilation, by design, if this is attempted.</para>
3801
3802           </section>
3803
3804           <section xml:id="policy_interaction.sizeloadtrigger">
3805             <info><title>size/load-check trigger</title></info>
3806
3807             <para>Assume a size policy issues an increasing sequence of sizes
3808             a, a q, a q<superscript>1</superscript>, a q<superscript>2</superscript>, ... For
3809             example, an exponential size policy might issue the sequence of
3810             sizes 8, 16, 32, 64, ...</para>
3811
3812             <para>If a load-check trigger policy is used, with loads
3813             α<subscript>min</subscript> and α<subscript>max</subscript>,
3814             respectively, then it is a good idea to have:</para>
3815
3816             <orderedlist>
3817               <listitem><para>α<subscript>max</subscript> ~ 1 / q</para></listitem>
3818
3819               <listitem><para>α<subscript>min</subscript> &lt; 1 / (2 q)</para></listitem>
3820             </orderedlist>
3821
3822             <para>This will ensure that the amortized hash cost of each
3823             modifying operation is at most approximately 3.</para>
3824
3825             <para>α<subscript>min</subscript> ~ α<subscript>max</subscript> is, in
3826             any case, a bad choice, and α<subscript>min</subscript> &gt;
3827             α <subscript>max</subscript> is horrendous.</para>
3828
3829           </section>
3830
3831         </section>
3832
3833       </section> <!-- details -->
3834
3835     </section> <!-- hash -->
3836
3837     <!-- tree -->
3838     <section xml:id="pbds.design.container.tree">
3839       <info><title>tree</title></info>
3840
3841       <section xml:id="container.tree.interface">
3842         <info><title>Interface</title></info>
3843
3844         <para>The tree-based container has the following declaration:</para>
3845         <programlisting>
3846           template&lt;
3847           typename Key,
3848           typename Mapped,
3849           typename Cmp_Fn = std::less&lt;Key&gt;,
3850           typename Tag = rb_tree_tag,
3851           template&lt;
3852           typename Const_Node_Iterator,
3853           typename Node_Iterator,
3854           typename Cmp_Fn_,
3855           typename Allocator_&gt;
3856           class Node_Update = null_node_update,
3857           typename Allocator = std::allocator&lt;char&gt; &gt;
3858           class tree;
3859         </programlisting>
3860
3861         <para>The parameters have the following meaning:</para>
3862
3863         <orderedlist>
3864           <listitem>
3865           <para><classname>Key</classname> is the key type.</para></listitem>
3866
3867           <listitem>
3868           <para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
3869
3870           <listitem>
3871           <para><classname>Cmp_Fn</classname> is a key comparison functor</para></listitem>
3872
3873           <listitem>
3874             <para><classname>Tag</classname> specifies which underlying data structure
3875           to use.</para></listitem>
3876
3877           <listitem>
3878             <para><classname>Node_Update</classname> is a policy for updating node
3879           invariants.</para></listitem>
3880
3881           <listitem>
3882             <para><classname>Allocator</classname> is an allocator
3883           type.</para></listitem>
3884         </orderedlist>
3885
3886         <para>The <classname>Tag</classname> parameter specifies which underlying
3887         data structure to use. Instantiating it by <classname>rb_tree_tag</classname>, <classname>splay_tree_tag</classname>, or
3888         <classname>ov_tree_tag</classname>,
3889         specifies an underlying red-black tree, splay tree, or
3890         ordered-vector tree, respectively; any other tag is illegal.
3891         Note that containers based on the former two contain more types
3892         and methods than the latter (e.g.,
3893         <classname>reverse_iterator</classname> and <classname>rbegin</classname>), and different
3894         exception and invalidation guarantees.</para>
3895
3896       </section>
3897
3898       <section xml:id="container.tree.details">
3899         <info><title>Details</title></info>
3900
3901         <section xml:id="container.tree.node">
3902           <info><title>Node Invariants</title></info>
3903
3904
3905           <para>Consider the two trees in the graphic below, labels A and B. The first
3906           is a tree of floats; the second is a tree of pairs, each
3907           signifying a geometric line interval. Each element in a tree is refered to as a node of the tree. Of course, each of
3908           these trees can support the usual queries: the first can easily
3909           search for <classname>0.4</classname>; the second can easily search for
3910           <classname>std::make_pair(10, 41)</classname>.</para>
3911
3912           <para>Each of these trees can efficiently support other queries.
3913           The first can efficiently determine that the 2rd key in the
3914           tree is <constant>0.3</constant>; the second can efficiently determine
3915           whether any of its intervals overlaps
3916           <programlisting>std::make_pair(29,42)</programlisting> (useful in geometric
3917           applications or distributed file systems with leases, for
3918           example).  It should be noted that an <classname>std::set</classname> can
3919           only solve these types of problems with linear complexity.</para>
3920
3921           <para>In order to do so, each tree stores some metadata in
3922           each node, and maintains node invariants (see <xref linkend="biblio.clrs2001"/>.) The first stores in
3923           each node the size of the sub-tree rooted at the node; the
3924           second stores at each node the maximal endpoint of the
3925           intervals at the sub-tree rooted at the node.</para>
3926
3927           <figure>
3928             <title>Tree node invariants</title>
3929             <mediaobject>
3930               <imageobject>
3931                 <imagedata align="center" format="PNG" scale="100"
3932                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_tree_node_invariants.png"/>
3933               </imageobject>
3934               <textobject>
3935                 <phrase>Tree node invariants</phrase>
3936               </textobject>
3937             </mediaobject>
3938           </figure>
3939
3940           <para>Supporting such trees is difficult for a number of
3941           reasons:</para>
3942
3943           <orderedlist>
3944             <listitem><para>There must be a way to specify what a node's metadata
3945             should be (if any).</para></listitem>
3946
3947             <listitem><para>Various operations can invalidate node
3948             invariants.  The graphic below shows how a right rotation,
3949             performed on A, results in B, with nodes x and y having
3950             corrupted invariants (the grayed nodes in C). The graphic shows
3951             how an insert, performed on D, results in E, with nodes x and y
3952             having corrupted invariants (the grayed nodes in F). It is not
3953             feasible to know outside the tree the effect of an operation on
3954             the nodes of the tree.</para></listitem>
3955
3956             <listitem><para>The search paths of standard associative containers are
3957             defined by comparisons between keys, and not through
3958             metadata.</para></listitem>
3959
3960             <listitem><para>It is not feasible to know in advance which methods trees
3961             can support. Besides the usual <classname>find</classname> method, the
3962             first tree can support a <classname>find_by_order</classname> method, while
3963             the second can support an <classname>overlaps</classname> method.</para></listitem>
3964           </orderedlist>
3965
3966           <figure>
3967             <title>Tree node invalidation</title>
3968             <mediaobject>
3969               <imageobject>
3970                 <imagedata align="center" format="PNG" scale="100"
3971                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_tree_node_invalidations.png"/>
3972               </imageobject>
3973               <textobject>
3974                 <phrase>Tree node invalidation</phrase>
3975               </textobject>
3976             </mediaobject>
3977           </figure>
3978
3979           <para>These problems are solved by a combination of two means:
3980           node iterators, and template-template node updater
3981           parameters.</para>
3982
3983           <section xml:id="container.tree.node.iterators">
3984             <info><title>Node Iterators</title></info>
3985
3986
3987             <para>Each tree-based container defines two additional iterator
3988             types, <classname>const_node_iterator</classname>
3989             and <classname>node_iterator</classname>.
3990             These iterators allow descending from a node to one of its
3991             children. Node iterator allow search paths different than those
3992             determined by the comparison functor. The <classname>tree</classname>
3993             supports the methods:</para>
3994             <programlisting>
3995               const_node_iterator
3996               node_begin() const;
3997
3998               node_iterator
3999               node_begin();
4000
4001               const_node_iterator
4002               node_end() const;
4003
4004               node_iterator
4005               node_end();
4006             </programlisting>
4007
4008             <para>The first pairs return node iterators corresponding to the
4009             root node of the tree; the latter pair returns node iterators
4010             corresponding to a just-after-leaf node.</para>
4011           </section>
4012
4013           <section xml:id="container.tree.node.updator">
4014             <info><title>Node Updator</title></info>
4015
4016             <para>The tree-based containers are parametrized by a
4017             <classname>Node_Update</classname> template-template parameter. A
4018             tree-based container instantiates
4019             <classname>Node_Update</classname> to some
4020             <classname>node_update</classname> class, and publicly subclasses
4021             <classname>node_update</classname>. The graphic below shows this
4022             scheme, as well as some predefined policies (which are explained
4023             below).</para>
4024
4025             <figure>
4026               <title>A tree and its update policy</title>
4027               <mediaobject>
4028                 <imageobject>
4029                   <imagedata align="center" format="PNG" scale="100"
4030                              fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_tree_node_updator_policy_cd.png"/>
4031                 </imageobject>
4032                 <textobject>
4033                   <phrase>A tree and its update policy</phrase>
4034                 </textobject>
4035               </mediaobject>
4036             </figure>
4037
4038             <para><classname>node_update</classname> (an instantiation of
4039             <classname>Node_Update</classname>) must define <classname>metadata_type</classname> as
4040             the type of metadata it requires. For order statistics,
4041             e.g., <classname>metadata_type</classname> might be <classname>size_t</classname>.
4042             The tree defines within each node a <classname>metadata_type</classname>
4043             object.</para>
4044
4045             <para><classname>node_update</classname> must also define the following method
4046             for restoring node invariants:</para>
4047             <programlisting>
4048               void
4049               operator()(node_iterator nd_it, const_node_iterator end_nd_it)
4050             </programlisting>
4051
4052             <para>In this method, <varname>nd_it</varname> is a
4053             <classname>node_iterator</classname> corresponding to a node whose
4054             A) all descendants have valid invariants, and B) its own
4055             invariants might be violated; <classname>end_nd_it</classname> is
4056             a <classname>const_node_iterator</classname> corresponding to a
4057             just-after-leaf node. This method should correct the node
4058             invariants of the node pointed to by
4059             <classname>nd_it</classname>. For example, say node x in the
4060             graphic below label A has an invalid invariant, but its' children,
4061             y and z have valid invariants. After the invocation, all three
4062             nodes should have valid invariants, as in label B.</para>
4063
4064
4065             <figure>
4066               <title>Restoring node invariants</title>
4067               <mediaobject>
4068                 <imageobject>
4069                   <imagedata align="center" format="PNG" scale="100"
4070                              fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_restoring_node_invariants.png"/>
4071                 </imageobject>
4072                 <textobject>
4073                   <phrase>Restoring node invariants</phrase>
4074                 </textobject>
4075               </mediaobject>
4076             </figure>
4077
4078             <para>When a tree operation might invalidate some node invariant,
4079             it invokes this method in its <classname>node_update</classname> base to
4080             restore the invariant. For example, the graphic below shows
4081             an <function>insert</function> operation (point A); the tree performs some
4082             operations, and calls the update functor three times (points B,
4083             C, and D). (It is well known that any <function>insert</function>,
4084             <function>erase</function>, <function>split</function> or <function>join</function>, can restore
4085             all node invariants by a small number of node invariant updates (<xref linkend="biblio.clrs2001"/>)
4086             .</para>
4087
4088             <figure>
4089               <title>Insert update sequence</title>
4090               <mediaobject>
4091                 <imageobject>
4092                   <imagedata align="center" format="PNG" scale="100"
4093                              fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_update_seq_diagram.png"/>
4094                 </imageobject>
4095                 <textobject>
4096                   <phrase>Insert update sequence</phrase>
4097                 </textobject>
4098               </mediaobject>
4099             </figure>
4100
4101             <para>To complete the description of the scheme, three questions
4102             need to be answered:</para>
4103
4104             <orderedlist>
4105               <listitem><para>How can a tree which supports order statistics define a
4106               method such as <classname>find_by_order</classname>?</para></listitem>
4107
4108               <listitem><para>How can the node updater base access methods of the
4109               tree?</para></listitem>
4110
4111               <listitem><para>How can the following cyclic dependency be resolved?
4112               <classname>node_update</classname> is a base class of the tree, yet it
4113               uses node iterators defined in the tree (its child).</para></listitem>
4114             </orderedlist>
4115
4116             <para>The first two questions are answered by the fact that
4117             <classname>node_update</classname> (an instantiation of
4118             <classname>Node_Update</classname>) is a <emphasis>public</emphasis> base class
4119             of the tree. Consequently:</para>
4120
4121             <orderedlist>
4122               <listitem><para>Any public methods of
4123               <classname>node_update</classname> are automatically methods of
4124               the tree (<xref linkend="biblio.alexandrescu01modern"/>).
4125               Thus an order-statistics node updater,
4126               <classname>tree_order_statistics_node_update</classname> defines
4127               the <function>find_by_order</function> method; any tree
4128               instantiated by this policy consequently supports this method as
4129               well.</para></listitem>
4130
4131               <listitem><para>In C++, if a base class declares a method as
4132               <literal>virtual</literal>, it is
4133               <literal>virtual</literal> in its subclasses. If
4134               <classname>node_update</classname> needs to access one of the
4135               tree's methods, say the member function
4136               <function>end</function>, it simply declares that method as
4137               <literal>virtual</literal> abstract.</para></listitem>
4138             </orderedlist>
4139
4140             <para>The cyclic dependency is solved through template-template
4141             parameters. <classname>Node_Update</classname> is parametrized by
4142             the tree's node iterators, its comparison functor, and its
4143             allocator type. Thus, instantiations of
4144             <classname>Node_Update</classname> have all information
4145             required.</para>
4146
4147             <para>This library assumes that constructing a metadata object and
4148             modifying it are exception free. Suppose that during some method,
4149             say <classname>insert</classname>, a metadata-related operation
4150             (e.g., changing the value of a metadata) throws an exception. Ack!
4151             Rolling back the method is unusually complex.</para>
4152
4153             <para>Previously, a distinction was made between redundant
4154             policies and null policies. Node invariants show a
4155             case where null policies are required.</para>
4156
4157             <para>Assume a regular tree is required, one which need not
4158             support order statistics or interval overlap queries.
4159             Seemingly, in this case a redundant policy - a policy which
4160             doesn't affect nodes' contents would suffice. This, would lead
4161             to the following drawbacks:</para>
4162
4163             <orderedlist>
4164               <listitem><para>Each node would carry a useless metadata object, wasting
4165               space.</para></listitem>
4166
4167               <listitem><para>The tree cannot know if its
4168               <classname>Node_Update</classname> policy actually modifies a
4169               node's metadata (this is halting reducible). In the graphic
4170               below, assume the shaded node is inserted. The tree would have
4171               to traverse the useless path shown to the root, applying
4172               redundant updates all the way.</para></listitem>
4173             </orderedlist>
4174             <figure>
4175               <title>Useless update path</title>
4176               <mediaobject>
4177                 <imageobject>
4178                   <imagedata align="center" format="PNG" scale="100"
4179                              fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_rationale_null_node_updator.png"/>
4180                 </imageobject>
4181                 <textobject>
4182                   <phrase>Useless update path</phrase>
4183                 </textobject>
4184               </mediaobject>
4185             </figure>
4186
4187
4188             <para>A null policy class, <classname>null_node_update</classname>
4189             solves both these problems. The tree detects that node
4190             invariants are irrelevant, and defines all accordingly.</para>
4191
4192           </section>
4193
4194         </section>
4195
4196         <section xml:id="container.tree.details.split">
4197           <info><title>Split and Join</title></info>
4198
4199           <para>Tree-based containers support split and join methods.
4200           It is possible to split a tree so that it passes
4201           all nodes with keys larger than a given key to a different
4202           tree. These methods have the following advantages over the
4203           alternative of externally inserting to the destination
4204           tree and erasing from the source tree:</para>
4205
4206           <orderedlist>
4207             <listitem><para>These methods are efficient - red-black trees are split
4208             and joined in poly-logarithmic complexity; ordered-vector
4209             trees are split and joined at linear complexity. The
4210             alternatives have super-linear complexity.</para></listitem>
4211
4212             <listitem><para>Aside from orders of growth, these operations perform
4213             few allocations and de-allocations. For red-black trees, allocations are not performed,
4214             and the methods are exception-free. </para></listitem>
4215           </orderedlist>
4216         </section>
4217
4218       </section> <!-- details -->
4219
4220     </section> <!-- tree -->
4221
4222     <!-- trie -->
4223     <section xml:id="pbds.design.container.trie">
4224       <info><title>Trie</title></info>
4225
4226       <section xml:id="container.trie.interface">
4227         <info><title>Interface</title></info>
4228
4229         <para>The trie-based container has the following declaration:</para>
4230         <programlisting>
4231           template&lt;typename Key,
4232           typename Mapped,
4233           typename Cmp_Fn = std::less&lt;Key&gt;,
4234           typename Tag = pat_trie_tag,
4235           template&lt;typename Const_Node_Iterator,
4236           typename Node_Iterator,
4237           typename E_Access_Traits_,
4238           typename Allocator_&gt;
4239           class Node_Update = null_node_update,
4240           typename Allocator = std::allocator&lt;char&gt; &gt;
4241           class trie;
4242         </programlisting>
4243
4244         <para>The parameters have the following meaning:</para>
4245
4246         <orderedlist>
4247           <listitem><para><classname>Key</classname> is the key type.</para></listitem>
4248
4249           <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
4250
4251           <listitem><para><classname>E_Access_Traits</classname> is described in below.</para></listitem>
4252
4253           <listitem><para><classname>Tag</classname> specifies which underlying data structure
4254           to use, and is described shortly.</para></listitem>
4255
4256           <listitem><para><classname>Node_Update</classname> is a policy for updating node
4257           invariants. This is described below.</para></listitem>
4258
4259           <listitem><para><classname>Allocator</classname> is an allocator
4260           type.</para></listitem>
4261         </orderedlist>
4262
4263         <para>The <classname>Tag</classname> parameter specifies which underlying
4264         data structure to use. Instantiating it by <classname>pat_trie_tag</classname>, specifies an
4265         underlying PATRICIA trie (explained shortly); any other tag is
4266         currently illegal.</para>
4267
4268         <para>Following is a description of a (PATRICIA) trie
4269         (this implementation follows <xref linkend="biblio.okasaki98mereable"/> and
4270         <xref linkend="biblio.filliatre2000ptset"/>).
4271         </para>
4272
4273         <para>A (PATRICIA) trie is similar to a tree, but with the
4274         following differences:</para>
4275
4276         <orderedlist>
4277           <listitem><para>It explicitly views keys as a sequence of elements.
4278           E.g., a trie can view a string as a sequence of
4279           characters; a trie can view a number as a sequence of
4280           bits.</para></listitem>
4281
4282           <listitem><para>It is not (necessarily) binary. Each node has fan-out n
4283           + 1, where n is the number of distinct
4284           elements.</para></listitem>
4285
4286           <listitem><para>It stores values only at leaf nodes.</para></listitem>
4287
4288           <listitem><para>Internal nodes have the properties that A) each has at
4289           least two children, and B) each shares the same prefix with
4290           any of its descendant.</para></listitem>
4291         </orderedlist>
4292
4293         <para>A (PATRICIA) trie has some useful properties:</para>
4294
4295         <orderedlist>
4296           <listitem><para>It can be configured to use large node fan-out, giving it
4297           very efficient find performance (albeit at insertion
4298           complexity and size).</para></listitem>
4299
4300           <listitem><para>It works well for common-prefix keys.</para></listitem>
4301
4302           <listitem><para>It can support efficiently queries such as which
4303           keys match a certain prefix. This is sometimes useful in file
4304           systems and routers, and for "type-ahead" aka predictive text matching
4305           on mobile devices.</para></listitem>
4306         </orderedlist>
4307
4308
4309       </section>
4310
4311       <section xml:id="container.trie.details">
4312         <info><title>Details</title></info>
4313
4314         <section xml:id="container.trie.details.etraits">
4315           <info><title>Element Access Traits</title></info>
4316
4317           <para>A trie inherently views its keys as sequences of elements.
4318           For example, a trie can view a string as a sequence of
4319           characters. A trie needs to map each of n elements to a
4320           number in {0, n - 1}. For example, a trie can map a
4321           character <varname>c</varname> to
4322           <programlisting>static_cast&lt;size_t&gt;(c)</programlisting>.</para>
4323
4324           <para>Seemingly, then, a trie can assume that its keys support
4325           (const) iterators, and that the <classname>value_type</classname> of this
4326           iterator can be cast to a <classname>size_t</classname>. There are several
4327           reasons, though, to decouple the mechanism by which the trie
4328           accesses its keys' elements from the trie:</para>
4329
4330           <orderedlist>
4331             <listitem><para>In some cases, the numerical value of an element is
4332             inappropriate. Consider a trie storing DNA strings. It is
4333             logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C',
4334             'G', 'T'}|. This requires mapping 'T' to 3, though.</para></listitem>
4335
4336             <listitem><para>In some cases the keys' iterators are different than what
4337             is needed. For example, a trie can be used to search for
4338             common suffixes, by using strings'
4339             <classname>reverse_iterator</classname>. As another example, a trie mapping
4340             UNICODE strings would have a huge fan-out if each node would
4341             branch on a UNICODE character; instead, one can define an
4342             iterator iterating over 8-bit (or less) groups.</para></listitem>
4343           </orderedlist>
4344
4345           <para>trie is,
4346           consequently, parametrized by <classname>E_Access_Traits</classname> -
4347           traits which instruct how to access sequences' elements.
4348           <classname>string_trie_e_access_traits</classname>
4349           is a traits class for strings. Each such traits define some
4350           types, like:</para>
4351           <programlisting>
4352             typename E_Access_Traits::const_iterator
4353           </programlisting>
4354
4355           <para>is a const iterator iterating over a key's elements. The
4356           traits class must also define methods for obtaining an iterator
4357           to the first and last element of a key.</para>
4358
4359           <para>The graphic below shows a
4360           (PATRICIA) trie resulting from inserting the words: "I wish
4361           that I could ever see a poem lovely as a trie" (which,
4362           unfortunately, does not rhyme).</para>
4363
4364           <para>The leaf nodes contain values; each internal node contains
4365           two <classname>typename E_Access_Traits::const_iterator</classname>
4366           objects, indicating the maximal common prefix of all keys in
4367           the sub-tree. For example, the shaded internal node roots a
4368           sub-tree with leafs "a" and "as". The maximal common prefix is
4369           "a". The internal node contains, consequently, to const
4370           iterators, one pointing to <varname>'a'</varname>, and the other to
4371           <varname>'s'</varname>.</para>
4372
4373           <figure>
4374             <title>A PATRICIA trie</title>
4375             <mediaobject>
4376               <imageobject>
4377                 <imagedata align="center" format="PNG" scale="100"
4378                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_pat_trie.png"/>
4379               </imageobject>
4380               <textobject>
4381                 <phrase>A PATRICIA trie</phrase>
4382               </textobject>
4383             </mediaobject>
4384           </figure>
4385
4386         </section>
4387
4388         <section xml:id="container.trie.details.node">
4389           <info><title>Node Invariants</title></info>
4390
4391           <para>Trie-based containers support node invariants, as do
4392           tree-based containers. There are two minor
4393           differences, though, which, unfortunately, thwart sharing them
4394           sharing the same node-updating policies:</para>
4395
4396           <orderedlist>
4397             <listitem>
4398               <para>A trie's <classname>Node_Update</classname> template-template
4399               parameter is parametrized by <classname>E_Access_Traits</classname>, while
4400               a tree's <classname>Node_Update</classname> template-template parameter is
4401             parametrized by <classname>Cmp_Fn</classname>.</para></listitem>
4402
4403             <listitem><para>Tree-based containers store values in all nodes, while
4404             trie-based containers (at least in this implementation) store
4405             values in leafs.</para></listitem>
4406           </orderedlist>
4407
4408           <para>The graphic below shows the scheme, as well as some predefined
4409           policies (which are explained below).</para>
4410
4411           <figure>
4412             <title>A trie and its update policy</title>
4413             <mediaobject>
4414               <imageobject>
4415                 <imagedata align="center" format="PNG" scale="100"
4416                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_trie_node_updator_policy_cd.png"/>
4417               </imageobject>
4418               <textobject>
4419                 <phrase>A trie and its update policy</phrase>
4420               </textobject>
4421             </mediaobject>
4422           </figure>
4423
4424
4425           <para>This library offers the following pre-defined trie node
4426           updating policies:</para>
4427
4428           <orderedlist>
4429             <listitem>
4430               <para>
4431                 <classname>trie_order_statistics_node_update</classname>
4432                 supports order statistics.
4433               </para>
4434             </listitem>
4435
4436             <listitem><para><classname>trie_prefix_search_node_update</classname>
4437             supports searching for ranges that match a given prefix.</para></listitem>
4438
4439             <listitem><para><classname>null_node_update</classname>
4440             is the null node updater.</para></listitem>
4441           </orderedlist>
4442
4443         </section>
4444
4445         <section xml:id="container.trie.details.split">
4446           <info><title>Split and Join</title></info>
4447           <para>Trie-based containers support split and join methods; the
4448           rationale is equal to that of tree-based containers supporting
4449           these methods.</para>
4450         </section>
4451
4452       </section> <!-- details -->
4453
4454     </section> <!-- trie -->
4455
4456     <!-- list_update -->
4457     <section xml:id="pbds.design.container.list">
4458       <info><title>List</title></info>
4459
4460       <section xml:id="container.list.interface">
4461         <info><title>Interface</title></info>
4462
4463         <para>The list-based container has the following declaration:</para>
4464         <programlisting>
4465           template&lt;typename Key,
4466           typename Mapped,
4467           typename Eq_Fn = std::equal_to&lt;Key&gt;,
4468           typename Update_Policy = move_to_front_lu_policy&lt;&gt;,
4469           typename Allocator = std::allocator&lt;char&gt; &gt;
4470           class list_update;
4471         </programlisting>
4472
4473         <para>The parameters have the following meaning:</para>
4474
4475         <orderedlist>
4476           <listitem>
4477             <para>
4478               <classname>Key</classname> is the key type.
4479             </para>
4480           </listitem>
4481
4482           <listitem>
4483             <para>
4484               <classname>Mapped</classname> is the mapped-policy.
4485             </para>
4486           </listitem>
4487
4488           <listitem>
4489             <para>
4490               <classname>Eq_Fn</classname> is a key equivalence functor.
4491             </para>
4492           </listitem>
4493
4494           <listitem>
4495             <para>
4496               <classname>Update_Policy</classname> is a policy updating positions in
4497               the list based on access patterns. It is described in the
4498               following subsection.
4499             </para>
4500           </listitem>
4501
4502           <listitem>
4503             <para>
4504               <classname>Allocator</classname> is an allocator type.
4505             </para>
4506           </listitem>
4507         </orderedlist>
4508
4509         <para>A list-based associative container is a container that
4510         stores elements in a linked-list. It does not order the elements
4511         by any particular order related to the keys.  List-based
4512         containers are primarily useful for creating "multimaps". In fact,
4513         list-based containers are designed in this library expressly for
4514         this purpose.</para>
4515
4516         <para>List-based containers might also be useful for some rare
4517         cases, where a key is encapsulated to the extent that only
4518         key-equivalence can be tested. Hash-based containers need to know
4519         how to transform a key into a size type, and tree-based containers
4520         need to know if some key is larger than another.  List-based
4521         associative containers, conversely, only need to know if two keys
4522         are equivalent.</para>
4523
4524         <para>Since a list-based associative container does not order
4525         elements by keys, is it possible to order the list in some
4526         useful manner? Remarkably, many on-line competitive
4527         algorithms exist for reordering lists to reflect access
4528         prediction. (See <xref linkend="biblio.motwani95random"/> and <xref linkend="biblio.andrew04mtf"/>).
4529         </para>
4530
4531       </section>
4532
4533       <section xml:id="container.list.details">
4534         <info><title>Details</title></info>
4535         <para>
4536         </para>
4537         <section xml:id="container.list.details.ds">
4538           <info><title>Underlying Data Structure</title></info>
4539
4540           <para>The graphic below shows a
4541           simple list of integer keys. If we search for the integer 6, we
4542           are paying an overhead: the link with key 6 is only the fifth
4543           link; if it were the first link, it could be accessed
4544           faster.</para>
4545
4546           <figure>
4547             <title>A simple list</title>
4548             <mediaobject>
4549               <imageobject>
4550                 <imagedata align="center" format="PNG" scale="100"
4551                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_simple_list.png"/>
4552               </imageobject>
4553               <textobject>
4554                 <phrase>A simple list</phrase>
4555               </textobject>
4556             </mediaobject>
4557           </figure>
4558
4559           <para>List-update algorithms reorder lists as elements are
4560           accessed. They try to determine, by the access history, which
4561           keys to move to the front of the list. Some of these algorithms
4562           require adding some metadata alongside each entry.</para>
4563
4564           <para>For example, in the graphic below label A shows the counter
4565           algorithm. Each node contains both a key and a count metadata
4566           (shown in bold). When an element is accessed (e.g. 6) its count is
4567           incremented, as shown in label B. If the count reaches some
4568           predetermined value, say 10, as shown in label C, the count is set
4569           to 0 and the node is moved to the front of the list, as in label
4570           D.
4571           </para>
4572
4573           <figure>
4574             <title>The counter algorithm</title>
4575             <mediaobject>
4576               <imageobject>
4577                 <imagedata align="center" format="PNG" scale="100"
4578                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_list_update.png"/>
4579               </imageobject>
4580               <textobject>
4581                 <phrase>The counter algorithm</phrase>
4582               </textobject>
4583             </mediaobject>
4584           </figure>
4585
4586
4587         </section>
4588
4589         <section xml:id="container.list.details.policies">
4590           <info><title>Policies</title></info>
4591
4592           <para>this library allows instantiating lists with policies
4593           implementing any algorithm moving nodes to the front of the
4594           list (policies implementing algorithms interchanging nodes are
4595           unsupported).</para>
4596
4597           <para>Associative containers based on lists are parametrized by a
4598           <classname>Update_Policy</classname> parameter. This parameter defines the
4599           type of metadata each node contains, how to create the
4600           metadata, and how to decide, using this metadata, whether to
4601           move a node to the front of the list. A list-based associative
4602           container object derives (publicly) from its update policy.
4603           </para>
4604
4605           <para>An instantiation of <classname>Update_Policy</classname> must define
4606           internally <classname>update_metadata</classname> as the metadata it
4607           requires. Internally, each node of the list contains, besides
4608           the usual key and data, an instance of <classname>typename
4609           Update_Policy::update_metadata</classname>.</para>
4610
4611           <para>An instantiation of <classname>Update_Policy</classname> must define
4612           internally two operators:</para>
4613           <programlisting>
4614             update_metadata
4615             operator()();
4616
4617             bool
4618             operator()(update_metadata &amp;);
4619           </programlisting>
4620
4621           <para>The first is called by the container object, when creating a
4622           new node, to create the node's metadata. The second is called
4623           by the container object, when a node is accessed (
4624           when a find operation's key is equivalent to the key of the
4625           node), to determine whether to move the node to the front of
4626           the list.
4627           </para>
4628
4629           <para>The library contains two predefined implementations of
4630           list-update policies. The first
4631           is <classname>lu_counter_policy</classname>, which implements the
4632           counter algorithm described above. The second is
4633           <classname>lu_move_to_front_policy</classname>,
4634           which unconditionally move an accessed element to the front of
4635           the list. The latter type is very useful in this library,
4636           since there is no need to associate metadata with each element.
4637           (See <xref linkend="biblio.andrew04mtf"/>
4638           </para>
4639
4640         </section>
4641
4642         <section xml:id="container.list.details.mapped">
4643           <info><title>Use in Multimaps</title></info>
4644
4645           <para>In this library, there are no equivalents for the standard's
4646           multimaps and multisets; instead one uses an associative
4647           container mapping primary keys to secondary keys.</para>
4648
4649           <para>List-based containers are especially useful as associative
4650           containers for secondary keys. In fact, they are implemented
4651           here expressly for this purpose.</para>
4652
4653           <para>To begin with, these containers use very little per-entry
4654           structure memory overhead, since they can be implemented as
4655           singly-linked lists. (Arrays use even lower per-entry memory
4656           overhead, but they are less flexible in moving around entries,
4657           and have weaker invalidation guarantees).</para>
4658
4659           <para>More importantly, though, list-based containers use very
4660           little per-container memory overhead. The memory overhead of an
4661           empty list-based container is practically that of a pointer.
4662           This is important for when they are used as secondary
4663           associative-containers in situations where the average ratio of
4664           secondary keys to primary keys is low (or even 1).</para>
4665
4666           <para>In order to reduce the per-container memory overhead as much
4667           as possible, they are implemented as closely as possible to
4668           singly-linked lists.</para>
4669
4670           <orderedlist>
4671             <listitem>
4672               <para>
4673                 List-based containers do not store internally the number
4674                 of values that they hold. This means that their <function>size</function>
4675                 method has linear complexity (just like <classname>std::list</classname>).
4676                 Note that finding the number of equivalent-key values in a
4677                 standard multimap also has linear complexity (because it must be
4678                 done,  via <function>std::distance</function> of the
4679                 multimap's <function>equal_range</function> method), but usually with
4680                 higher constants.
4681               </para>
4682             </listitem>
4683
4684             <listitem>
4685               <para>
4686                 Most associative-container objects each hold a policy
4687                 object (a hash-based container object holds a
4688                 hash functor). List-based containers, conversely, only have
4689                 class-wide policy objects.
4690               </para>
4691             </listitem>
4692           </orderedlist>
4693
4694
4695         </section>
4696
4697       </section> <!-- details -->
4698
4699     </section> <!-- list -->
4700
4701
4702     <!-- priority_queue -->
4703     <section xml:id="pbds.design.container.priority_queue">
4704       <info><title>Priority Queue</title></info>
4705
4706       <section xml:id="container.priority_queue.interface">
4707         <info><title>Interface</title></info>
4708
4709         <para>The priority queue container has the following
4710         declaration:
4711         </para>
4712         <programlisting>
4713           template&lt;typename  Value_Type,
4714           typename  Cmp_Fn = std::less&lt;Value_Type&gt;,
4715           typename  Tag = pairing_heap_tag,
4716           typename  Allocator = std::allocator&lt;char &gt; &gt;
4717           class priority_queue;
4718         </programlisting>
4719
4720         <para>The parameters have the following meaning:</para>
4721
4722         <orderedlist>
4723           <listitem><para><classname>Value_Type</classname> is the value type.</para></listitem>
4724
4725           <listitem><para><classname>Cmp_Fn</classname> is a value comparison functor</para></listitem>
4726
4727           <listitem><para><classname>Tag</classname> specifies which underlying data structure
4728           to use.</para></listitem>
4729
4730           <listitem><para><classname>Allocator</classname> is an allocator
4731           type.</para></listitem>
4732         </orderedlist>
4733
4734         <para>The <classname>Tag</classname> parameter specifies which underlying
4735         data structure to use. Instantiating it by<classname>pairing_heap_tag</classname>,<classname>binary_heap_tag</classname>,
4736         <classname>binomial_heap_tag</classname>,
4737         <classname>rc_binomial_heap_tag</classname>,
4738         or <classname>thin_heap_tag</classname>,
4739         specifies, respectively,
4740         an underlying pairing heap (<xref linkend="biblio.fredman86pairing"/>),
4741         binary heap (<xref linkend="biblio.clrs2001"/>),
4742         binomial heap (<xref linkend="biblio.clrs2001"/>),
4743         a binomial heap with a redundant binary counter (<xref linkend="biblio.maverik_lowerbounds"/>),
4744         or a thin heap (<xref linkend="biblio.kt99fat_heaps"/>).
4745         </para>
4746
4747         <para>
4748           As mentioned in the tutorial,
4749           <classname>__gnu_pbds::priority_queue</classname> shares most of the
4750           same interface with <classname>std::priority_queue</classname>.
4751           E.g. if <varname>q</varname> is a priority queue of type
4752           <classname>Q</classname>, then <function>q.top()</function> will
4753           return the "largest" value in the container (according to
4754           <classname>typename
4755           Q::cmp_fn</classname>). <classname>__gnu_pbds::priority_queue</classname>
4756           has a larger (and very slightly different) interface than
4757           <classname>std::priority_queue</classname>, however, since typically
4758           <classname>push</classname> and <classname>pop</classname> are deemed
4759         insufficient for manipulating priority-queues. </para>
4760
4761         <para>Different settings require different priority-queue
4762         implementations which are described in later; see traits
4763         discusses ways to differentiate between the different traits of
4764         different implementations.</para>
4765
4766
4767       </section>
4768
4769       <section xml:id="container.priority_queue.details">
4770         <info><title>Details</title></info>
4771
4772         <section xml:id="container.priority_queue.details.iterators">
4773           <info><title>Iterators</title></info>
4774
4775           <para>There are many different underlying-data structures for
4776           implementing priority queues. Unfortunately, most such
4777           structures are oriented towards making <function>push</function> and
4778           <function>top</function> efficient, and consequently don't allow efficient
4779           access of other elements: for instance, they cannot support an efficient
4780           <function>find</function> method. In the use case where it
4781           is important to both access and "do something with" an
4782           arbitrary value, one would be out of luck. For example, many graph algorithms require
4783           modifying a value (typically increasing it in the sense of the
4784           priority queue's comparison functor).</para>
4785
4786           <para>In order to access and manipulate an arbitrary value in a
4787           priority queue, one needs to reference the internals of the
4788           priority queue from some form of an associative container -
4789           this is unavoidable. Of course, in order to maintain the
4790           encapsulation of the priority queue, this needs to be done in a
4791           way that minimizes exposure to implementation internals.</para>
4792
4793           <para>In this library the priority queue's <function>insert</function>
4794           method returns an iterator, which if valid can be used for subsequent <function>modify</function> and
4795           <function>erase</function> operations. This both preserves the priority
4796           queue's encapsulation, and allows accessing arbitrary values (since the
4797           returned iterators from the <function>push</function> operation can be
4798           stored in some form of associative container).</para>
4799
4800           <para>Priority queues' iterators present a problem regarding their
4801           invalidation guarantees. One assumes that calling
4802           <function>operator++</function> on an iterator will associate it
4803           with the "next" value. Priority-queues are
4804           self-organizing: each operation changes what the "next" value
4805           means. Consequently, it does not make sense that <function>push</function>
4806           will return an iterator that can be incremented - this can have
4807           no possible use. Also, as in the case of hash-based containers,
4808           it is awkward to define if a subsequent <function>push</function> operation
4809           invalidates a prior returned iterator: it invalidates it in the
4810           sense that its "next" value is not related to what it
4811           previously considered to be its "next" value. However, it might not
4812           invalidate it, in the sense that it can be
4813           de-referenced and used for <function>modify</function> and <function>erase</function>
4814           operations.</para>
4815
4816           <para>Similarly to the case of the other unordered associative
4817           containers, this library uses a distinction between
4818           point-type and range type iterators. A priority queue's <classname>iterator</classname> can always be
4819           converted to a <classname>point_iterator</classname>, and a
4820           <classname>const_iterator</classname> can always be converted to a
4821           <classname>point_const_iterator</classname>.</para>
4822
4823           <para>The following snippet demonstrates manipulating an arbitrary
4824           value:</para>
4825           <programlisting>
4826             // A priority queue of integers.
4827             priority_queue&lt;int &gt; p;
4828
4829             // Insert some values into the priority queue.
4830             priority_queue&lt;int &gt;::point_iterator it = p.push(0);
4831
4832             p.push(1);
4833             p.push(2);
4834
4835             // Now modify a value.
4836             p.modify(it, 3);
4837
4838             assert(p.top() == 3);
4839           </programlisting>
4840
4841
4842           <para>It should be noted that an alternative design could embed an
4843           associative container in a priority queue. Could, but most
4844           probably should not. To begin with, it should be noted that one
4845           could always encapsulate a priority queue and an associative
4846           container mapping values to priority queue iterators with no
4847           performance loss. One cannot, however, "un-encapsulate" a priority
4848           queue embedding an associative container, which might lead to
4849           performance loss. Assume, that one needs to associate each value
4850           with some data unrelated to priority queues. Then using
4851           this library's design, one could use an
4852           associative container mapping each value to a pair consisting of
4853           this data and a priority queue's iterator. Using the embedded
4854           method would need to use two associative containers. Similar
4855           problems might arise in cases where a value can reside
4856           simultaneously in many priority queues.</para>
4857
4858         </section>
4859
4860
4861         <section xml:id="container.priority_queue.details.d">
4862           <info><title>Underlying Data Structure</title></info>
4863
4864           <para>There are three main implementations of priority queues: the
4865           first employs a binary heap, typically one which uses a
4866           sequence; the second uses a tree (or forest of trees), which is
4867           typically less structured than an associative container's tree;
4868           the third simply uses an associative container. These are
4869           shown in the graphic below, in labels A1 and A2, label B, and label C.</para>
4870
4871           <figure>
4872             <title>Underlying Priority-Queue Data-Structures.</title>
4873             <mediaobject>
4874               <imageobject>
4875                 <imagedata align="center" format="PNG" scale="100"
4876                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_priority_queue_different_underlying_dss.png"/>
4877               </imageobject>
4878               <textobject>
4879                 <phrase>Underlying Priority-Queue Data-Structures.</phrase>
4880               </textobject>
4881             </mediaobject>
4882           </figure>
4883
4884           <para>Roughly speaking, any value that is both pushed and popped
4885           from a priority queue must incur a logarithmic expense (in the
4886           amortized sense). Any priority queue implementation that would
4887           avoid this, would violate known bounds on comparison-based
4888           sorting (see <xref linkend="biblio.clrs2001"/> and <xref linkend="biblio.brodal96priority"/>).
4889           </para>
4890
4891           <para>Most implementations do
4892           not differ in the asymptotic amortized complexity of
4893           <function>push</function> and <function>pop</function> operations, but they differ in
4894           the constants involved, in the complexity of other operations
4895           (e.g., <function>modify</function>), and in the worst-case
4896           complexity of single operations. In general, the more
4897           "structured" an implementation (i.e., the more internal
4898           invariants it possesses) - the higher its amortized complexity
4899           of <function>push</function> and <function>pop</function> operations.</para>
4900
4901           <para>This library implements different algorithms using a
4902           single class: <classname>priority_queue</classname>.
4903           Instantiating the <classname>Tag</classname> template parameter, "selects"
4904           the implementation:</para>
4905
4906           <orderedlist>
4907             <listitem><para>
4908               Instantiating <classname>Tag = binary_heap_tag</classname> creates
4909               a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally
4910               selected by priority_queue
4911               if <classname>Value_Type</classname> is instantiated by a primitive type
4912               (e.g., an <type>int</type>); the latter is
4913               internally selected for all other types (e.g.,
4914               <classname>std::string</classname>). This implementations is relatively
4915               unstructured, and so has good <classname>push</classname> and <classname>pop</classname>
4916               performance; it is the "best-in-kind" for primitive
4917               types, e.g., <type>int</type>s. Conversely, it has
4918               high worst-case performance, and can support only linear-time
4919             <function>modify</function> and <function>erase</function> operations.</para></listitem>
4920
4921             <listitem><para>Instantiating <classname>Tag =
4922             pairing_heap_tag</classname> creates a pairing heap of the form
4923             in represented by label B in the graphic above. This
4924             implementations too is relatively unstructured, and so has good
4925             <function>push</function> and <function>pop</function>
4926             performance; it is the "best-in-kind" for non-primitive types,
4927             e.g., <classname>std:string</classname>s. It also has very good
4928             worst-case <function>push</function> and
4929             <function>join</function> performance (O(1)), but has high
4930             worst-case <function>pop</function>
4931             complexity.</para></listitem>
4932
4933             <listitem><para>Instantiating <classname>Tag =
4934             binomial_heap_tag</classname> creates a binomial heap of the
4935             form repsented by label B in the graphic above. This
4936             implementations is more structured than a pairing heap, and so
4937             has worse <function>push</function> and <function>pop</function>
4938             performance. Conversely, it has sub-linear worst-case bounds for
4939             <function>pop</function>, e.g., and so it might be preferred in
4940             cases where responsiveness is important.</para></listitem>
4941
4942             <listitem><para>Instantiating <classname>Tag =
4943             rc_binomial_heap_tag</classname> creates a binomial heap of the
4944             form represented in label B above, accompanied by a redundant
4945             counter which governs the trees. This implementations is
4946             therefore more structured than a binomial heap, and so has worse
4947             <function>push</function> and <function>pop</function>
4948             performance. Conversely, it guarantees O(1)
4949             <function>push</function> complexity, and so it might be
4950             preferred in cases where the responsiveness of a binomial heap
4951             is insufficient.</para></listitem>
4952
4953             <listitem><para>Instantiating <classname>Tag =
4954             thin_heap_tag</classname> creates a thin heap of the form
4955             represented by the label B in the graphic above. This
4956             implementations too is more structured than a pairing heap, and
4957             so has worse <function>push</function> and
4958             <function>pop</function> performance. Conversely, it has better
4959             worst-case and identical amortized complexities than a Fibonacci
4960             heap, and so might be more appropriate for some graph
4961             algorithms.</para></listitem>
4962           </orderedlist>
4963
4964           <para>Of course, one can use any order-preserving associative
4965           container as a priority queue, as in the graphic above label C, possibly by creating an adapter class
4966           over the associative container (much as
4967           <classname>std::priority_queue</classname> can adapt <classname>std::vector</classname>).
4968           This has the advantage that no cross-referencing is necessary
4969           at all; the priority queue itself is an associative container.
4970           Most associative containers are too structured to compete with
4971           priority queues in terms of <function>push</function> and <function>pop</function>
4972           performance.</para>
4973
4974
4975
4976         </section>
4977
4978         <section xml:id="container.priority_queue.details.traits">
4979           <info><title>Traits</title></info>
4980
4981           <para>It would be nice if all priority queues could
4982           share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining
4983           two binary heaps might throw an exception (not corrupt
4984           any of the heaps on which it operates), but joining two pairing
4985           heaps is exception free.</para>
4986
4987           <para>Tags and traits are very useful for manipulating generic
4988           types. <classname>__gnu_pbds::priority_queue</classname>
4989           publicly defines <classname>container_category</classname> as one of the tags. Given any
4990           container <classname>Cntnr</classname>, the tag of the underlying
4991           data structure can be found via <classname>typename
4992           Cntnr::container_category</classname>; this is one of the possible tags shown in the graphic below.
4993           </para>
4994
4995           <figure>
4996             <title>Priority-Queue Data-Structure Tags.</title>
4997             <mediaobject>
4998               <imageobject>
4999                 <imagedata align="center" format="PNG" scale="100"
5000                            fileref="/mnt/share/src/gcc.svn-trunk/libstdc++-v3/doc/xml/images/pbds_priority_queue_tag_hierarchy.png"/>
5001               </imageobject>
5002               <textobject>
5003                 <phrase>Priority-Queue Data-Structure Tags.</phrase>
5004               </textobject>
5005             </mediaobject>
5006           </figure>
5007
5008
5009           <para>Additionally, a traits mechanism can be used to query a
5010           container type for its attributes. Given any container
5011           <classname>Cntnr</classname>, then <programlisting>__gnu_pbds::container_traits&lt;Cntnr&gt;</programlisting>
5012           is a traits class identifying the properties of the
5013           container.</para>
5014
5015           <para>To find if a container might throw if two of its objects are
5016           joined, one can use
5017           <programlisting>
5018             container_traits&lt;Cntnr&gt;::split_join_can_throw
5019           </programlisting>
5020           </para>
5021
5022           <para>
5023             Different priority-queue implementations have different invalidation guarantees. This is
5024             especially important, since there is no way to access an arbitrary
5025             value of priority queues except for iterators. Similarly to
5026             associative containers, one can use
5027             <programlisting>
5028               container_traits&lt;Cntnr&gt;::invalidation_guarantee
5029             </programlisting>
5030           to get the invalidation guarantee type of a priority queue.</para>
5031
5032           <para>It is easy to understand from the graphic above, what <classname>container_traits&lt;Cntnr&gt;::invalidation_guarantee</classname>
5033           will be for different implementations. All implementations of
5034           type represented by label B have <classname>point_invalidation_guarantee</classname>:
5035           the container can freely internally reorganize the nodes -
5036           range-type iterators are invalidated, but point-type iterators
5037           are always valid. Implementations of type represented by labels A1 and A2 have <classname>basic_invalidation_guarantee</classname>:
5038           the container can freely internally reallocate the array - both
5039           point-type and range-type iterators might be invalidated.</para>
5040
5041           <para>
5042             This has major implications, and constitutes a good reason to avoid
5043             using binary heaps. A binary heap can perform <function>modify</function>
5044             or <function>erase</function> efficiently given a valid point-type
5045             iterator. However, in order to supply it with a valid point-type
5046             iterator, one needs to iterate (linearly) over all
5047             values, then supply the relevant iterator (recall that a
5048             range-type iterator can always be converted to a point-type
5049             iterator). This means that if the number of <function>modify</function> or
5050             <function>erase</function> operations is non-negligible (say
5051             super-logarithmic in the total sequence of operations) - binary
5052             heaps will perform badly.
5053           </para>
5054
5055         </section>
5056
5057       </section> <!-- details -->
5058
5059     </section> <!-- priority_queue -->
5060
5061
5062
5063   </section> <!-- container -->
5064
5065   </section> <!-- design -->
5066
5067
5068
5069   <!-- S04: Test -->
5070   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5071               href="test_policy_data_structures.xml">
5072   </xi:include>
5073
5074   <!-- S05: Reference/Acknowledgments -->
5075   <section xml:id="pbds.ack">
5076     <info><title>Acknowledgments</title></info>
5077     <?dbhtml filename="policy_data_structures_biblio.html"?>
5078
5079     <para>
5080       Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research
5081       Laboratories), and Benjamin Kosnik (Red Hat).
5082     </para>
5083
5084     <para>
5085       This library was partially written at
5086       <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.haifa.il.ibm.com/">IBM's Haifa Research Labs</link>.
5087       It is based heavily on policy-based design and uses many useful
5088       techniques from Modern C++ Design: Generic Programming and Design
5089       Patterns Applied by Andrei Alexandrescu.
5090     </para>
5091
5092     <para>
5093       Two ideas are borrowed from the SGI-STL implementation:
5094     </para>
5095
5096     <orderedlist>
5097       <listitem>
5098         <para>
5099           The prime-based resize policies use a list of primes taken from
5100           the SGI-STL implementation.
5101         </para>
5102       </listitem>
5103
5104       <listitem>
5105         <para>
5106           The red-black trees contain both a root node and a header node
5107           (containing metadata), connected in a way that forward and
5108           reverse iteration can be performed efficiently.
5109         </para>
5110       </listitem>
5111     </orderedlist>
5112
5113     <para>
5114       Some test utilities borrow ideas from
5115       <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.boost.org/doc/libs/release/libs/timer/index.html">boost::timer</link>.
5116     </para>
5117
5118     <para>
5119       We would like to thank Scott Meyers for useful comments (without
5120       attributing to him any flaws in the design or implementation of the
5121       library).
5122     </para>
5123     <para>We would like to thank Matt Austern for the suggestion to
5124     include tries.</para>
5125   </section>
5126
5127   <!-- S06: Biblio -->
5128   <bibliography xml:id="pbds.biblio">
5129     <info>
5130       <title>
5131         Bibliography
5132       </title>
5133     </info>
5134     <?dbhtml filename="policy_data_structures_biblio.html"?>
5135
5136     <!-- 01 -->
5137     <biblioentry xml:id="biblio.abrahams97exception">
5138       <title>
5139         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5140               xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1997/N1075.pdf">
5141           STL Exception Handling Contract
5142         </link>
5143       </title>
5144       <date>1997</date>
5145
5146       <author>
5147         <personname>
5148           <firstname>
5149             Dave
5150           </firstname>
5151           <surname>
5152             Abrahams
5153           </surname>
5154         </personname>
5155       </author>
5156
5157       <publisher>
5158         <publishername>
5159           ISO SC22/WG21
5160         </publishername>
5161       </publisher>
5162     </biblioentry>
5163
5164
5165     <!-- 02 -->
5166     <biblioentry xml:id="biblio.alexandrescu01modern">
5167       <title>
5168         Modern C++ Design: Generic Programming and Design Patterns Applied
5169       </title>
5170       <date>
5171         2001
5172       </date>
5173
5174       <author>
5175         <personname>
5176           <firstname>
5177             Andrei
5178           </firstname>
5179           <surname>
5180             Alexandrescu
5181           </surname>
5182         </personname>
5183       </author>
5184
5185       <publisher>
5186         <publishername>
5187           Addison-Wesley Publishing Company
5188         </publishername>
5189       </publisher>
5190     </biblioentry>
5191
5192
5193     <!-- 03 -->
5194     <biblioentry xml:id="biblio.andrew04mtf">
5195       <title>
5196         MTF, Bit, and COMB: A Guide to Deterministic and Randomized
5197         Algorithms for the List Update Problem
5198       </title>
5199
5200       <authorgroup>
5201         <author>
5202           <personname>
5203             <firstname>
5204               K.
5205             </firstname>
5206             <surname>
5207               Andrew
5208             </surname>
5209           </personname>
5210         </author>
5211
5212         <author>
5213           <personname>
5214             <firstname>
5215               D.
5216             </firstname>
5217             <surname>
5218               Gleich
5219             </surname>
5220           </personname>
5221         </author>
5222       </authorgroup>
5223     </biblioentry>
5224
5225     <!-- 04 -->
5226     <biblioentry xml:id="biblio.austern00noset">
5227       <title>
5228         Why You Shouldn't Use set - and What You Should Use Instead
5229       </title>
5230       <date>
5231         April, 2000
5232       </date>
5233
5234       <author>
5235         <personname>
5236           <firstname>
5237             Matthew
5238           </firstname>
5239           <surname>
5240             Austern
5241           </surname>
5242         </personname>
5243       </author>
5244
5245       <publisher>
5246         <publishername>
5247           C++ Report
5248         </publishername>
5249       </publisher>
5250     </biblioentry>
5251
5252     <!-- 05 -->
5253     <biblioentry xml:id="biblio.austern01htprop">
5254       <title>
5255         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5256               xlink:href="http://www.open-std.org/JTC1/sc22/wg21/docs/papers/2001/n1326.html">
5257           A Proposal to Add Hashtables to the Standard Library
5258         </link>
5259       </title>
5260       <date>
5261         2001
5262       </date>
5263
5264       <author>
5265         <personname>
5266           <firstname>
5267             Matthew
5268           </firstname>
5269           <surname>
5270             Austern
5271           </surname>
5272         </personname>
5273       </author>
5274
5275       <publisher>
5276         <publishername>
5277           ISO SC22/WG21
5278         </publishername>
5279       </publisher>
5280     </biblioentry>
5281
5282     <!-- 06 -->
5283     <biblioentry xml:id="biblio.austern98segmentedit">
5284       <title>
5285         Segmented iterators and hierarchical algorithms
5286       </title>
5287       <date>
5288         April, 1998
5289       </date>
5290
5291       <author>
5292         <personname>
5293           <firstname>
5294             Matthew
5295           </firstname>
5296           <surname>
5297             Austern
5298           </surname>
5299         </personname>
5300       </author>
5301
5302       <publisher>
5303         <publishername>
5304           Generic Programming
5305         </publishername>
5306       </publisher>
5307     </biblioentry>
5308
5309     <!-- 07 -->
5310     <biblioentry xml:id="biblio.dawestimer">
5311       <title>
5312         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5313               xlink:href="www.boost.org/doc/libs/release/libs/timer/">
5314           Boost Timer Library
5315         </link>
5316       </title>
5317
5318       <author>
5319         <personname>
5320           <firstname>
5321             Beeman
5322           </firstname>
5323           <surname>
5324             Dawes
5325           </surname>
5326         </personname>
5327       </author>
5328
5329       <publisher>
5330         <publishername>
5331           Boost
5332         </publishername>
5333       </publisher>
5334     </biblioentry>
5335
5336     <!-- 08 -->
5337     <biblioentry xml:id="biblio.clearypool">
5338       <title>
5339         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5340               xlink:href="www.boost.org/doc/libs/release/libs/pool/">
5341           Boost Pool Library
5342         </link>
5343       </title>
5344
5345       <author>
5346         <personname>
5347           <firstname>
5348             Stephen
5349           </firstname>
5350           <surname>
5351             Cleary
5352           </surname>
5353         </personname>
5354       </author>
5355
5356       <publisher>
5357         <publishername>
5358           Boost
5359         </publishername>
5360       </publisher>
5361     </biblioentry>
5362
5363
5364     <!-- 09 -->
5365     <biblioentry xml:id="biblio.maddocktraits">
5366       <title>
5367         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5368               xlink:href="www.boost.org/doc/libs/release/libs/type_traits/">
5369           Boost Type Traits Library
5370         </link>
5371       </title>
5372       <authorgroup>
5373         <author>
5374           <personname>
5375             <firstname>
5376               Maddock
5377             </firstname>
5378             <surname>
5379               John
5380             </surname>
5381           </personname>
5382         </author>
5383         <author>
5384           <personname>
5385             <firstname>
5386               Stephen
5387             </firstname>
5388             <surname>
5389               Cleary
5390             </surname>
5391           </personname>
5392         </author>
5393       </authorgroup>
5394       <publisher>
5395         <publishername>
5396           Boost
5397         </publishername>
5398       </publisher>
5399     </biblioentry>
5400
5401     <!-- 10 -->
5402     <biblioentry xml:id="biblio.brodal96priority">
5403       <title>
5404         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5405               xlink:href="http://portal.acm.org/citation.cfm?id=313883">
5406           Worst-case efficient priority queues
5407         </link>
5408       </title>
5409
5410       <author>
5411         <personname>
5412           <firstname>
5413             Gerth
5414           </firstname>
5415           <surname>
5416             Stolting Brodal
5417           </surname>
5418         </personname>
5419       </author>
5420
5421     </biblioentry>
5422
5423     <!-- 11 -->
5424     <biblioentry xml:id="biblio.bulkamayheweff">
5425       <title>
5426         Efficient C++ Programming Techniques
5427       </title>
5428       <date>
5429         1997
5430       </date>
5431
5432       <authorgroup>
5433         <author>
5434           <personname>
5435             <firstname>
5436               D.
5437             </firstname>
5438             <surname>
5439               Bulka
5440             </surname>
5441           </personname>
5442         </author>
5443         <author>
5444           <personname>
5445             <firstname>
5446               D.
5447             </firstname>
5448             <surname>
5449               Mayhew
5450             </surname>
5451           </personname>
5452         </author>
5453       </authorgroup>
5454
5455       <publisher>
5456         <publishername>
5457           Addison-Wesley Publishing Company
5458         </publishername>
5459       </publisher>
5460     </biblioentry>
5461
5462     <!-- 12 -->
5463     <biblioentry xml:id="biblio.clrs2001">
5464       <title>
5465         Introduction to Algorithms, 2nd edition
5466       </title>
5467       <date>
5468         2001
5469       </date>
5470       <authorgroup>
5471         <author>
5472           <personname>
5473             <firstname>
5474               T. H.
5475             </firstname>
5476             <surname>
5477               Cormen
5478             </surname>
5479           </personname>
5480         </author>
5481
5482         <author>
5483           <personname>
5484             <firstname>
5485               C. E.
5486             </firstname>
5487             <surname>
5488               Leiserson
5489             </surname>
5490           </personname>
5491         </author>
5492
5493         <author>
5494           <personname>
5495             <firstname>
5496               R. L.
5497             </firstname>
5498             <surname>
5499               Rivest
5500             </surname>
5501           </personname>
5502         </author>
5503
5504         <author>
5505           <personname>
5506             <firstname>
5507               C.
5508             </firstname>
5509             <surname>
5510               Stein
5511             </surname>
5512           </personname>
5513         </author>
5514       </authorgroup>
5515       <publisher>
5516         <publishername>
5517           MIT Press
5518         </publishername>
5519       </publisher>
5520     </biblioentry>
5521
5522     <!-- 13 -->
5523     <biblioentry xml:id="biblio.dubhashi98neg">
5524       <title>
5525         Balls and bins: A study in negative dependence
5526       </title>
5527       <date>
5528         1998
5529       </date>
5530       <authorgroup>
5531         <author>
5532           <personname>
5533             <firstname>
5534               D.
5535             </firstname>
5536             <surname>
5537               Dubashi
5538             </surname>
5539           </personname>
5540         </author>
5541         <author>
5542           <personname>
5543             <firstname>
5544               D.
5545             </firstname>
5546             <surname>
5547               Ranjan
5548             </surname>
5549           </personname>
5550         </author>
5551       </authorgroup>
5552
5553       <publisher>
5554         <publishername>
5555           Random Structures and Algorithms 13
5556         </publishername>
5557       </publisher>
5558     </biblioentry>
5559
5560
5561     <!-- 14 -->
5562     <biblioentry xml:id="biblio.fagin79extendible">
5563       <title>
5564         Extendible hashing - a fast access method for dynamic files
5565       </title>
5566       <date>
5567         1979
5568       </date>
5569
5570       <authorgroup>
5571         <author>
5572           <personname>
5573             <firstname>
5574               R.
5575             </firstname>
5576             <surname>
5577               Fagin
5578             </surname>
5579           </personname>
5580         </author>
5581         <author>
5582           <personname>
5583             <firstname>
5584               J.
5585             </firstname>
5586             <surname>
5587               Nievergelt
5588             </surname>
5589           </personname>
5590         </author>
5591         <author>
5592           <personname>
5593             <firstname>
5594               N.
5595             </firstname>
5596             <surname>
5597               Pippenger
5598             </surname>
5599           </personname>
5600         </author>
5601         <author>
5602           <personname>
5603             <firstname>
5604               H. R.
5605             </firstname>
5606             <surname>
5607               Strong
5608             </surname>
5609           </personname>
5610         </author>
5611       </authorgroup>
5612
5613       <publisher>
5614         <publishername>
5615           ACM Trans. Database Syst. 4
5616         </publishername>
5617       </publisher>
5618     </biblioentry>
5619
5620
5621
5622     <!-- 15 -->
5623     <biblioentry xml:id="biblio.filliatre2000ptset">
5624       <title>
5625         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5626               xlink:href="http://cristal.inria.fr/~frisch/icfp06_contest/advtr/applyOmatic/ptset.ml">
5627           Ptset: Sets of integers implemented as Patricia trees
5628         </link>
5629       </title>
5630
5631       <date>
5632         2000
5633       </date>
5634
5635       <author>
5636         <personname>
5637           <firstname>
5638             Jean-Christophe
5639           </firstname>
5640           <surname>
5641             Filliatre
5642           </surname>
5643         </personname>
5644       </author>
5645     </biblioentry>
5646
5647
5648
5649     <!-- 16 -->
5650     <biblioentry xml:id="biblio.fredman86pairing">
5651       <title>
5652         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5653               xlink:href="http://www.cs.cmu.edu/~sleator/papers/pairing-heaps.pdf">
5654           The pairing heap: a new form of self-adjusting heap
5655         </link>
5656       </title>
5657       <date>
5658         1986
5659       </date>
5660       <authorgroup>
5661         <author>
5662           <personname>
5663             <firstname>
5664               M. L.
5665             </firstname>
5666             <surname>
5667               Fredman
5668             </surname>
5669           </personname>
5670         </author>
5671         <author>
5672           <personname>
5673             <firstname>
5674               R.
5675             </firstname>
5676             <surname>
5677               Sedgewick
5678             </surname>
5679           </personname>
5680         </author>
5681         <author>
5682           <personname>
5683             <firstname>
5684               D. D.
5685             </firstname>
5686             <surname>
5687               Sleator
5688             </surname>
5689           </personname>
5690         </author>
5691         <author>
5692           <personname>
5693             <firstname>
5694               R. E.
5695             </firstname>
5696             <surname>
5697               Tarjan
5698             </surname>
5699           </personname>
5700         </author>
5701       </authorgroup>
5702     </biblioentry>
5703
5704
5705     <!-- 17 -->
5706     <biblioentry xml:id="biblio.gof">
5707       <title>
5708         Design Patterns - Elements of Reusable Object-Oriented Software
5709       </title>
5710       <date>
5711         1995
5712       </date>
5713       <authorgroup>
5714         <author>
5715           <personname>
5716             <firstname>
5717               E.
5718             </firstname>
5719             <surname>
5720               Gamma
5721             </surname>
5722           </personname>
5723         </author>
5724         <author>
5725           <personname>
5726             <firstname>
5727               R.
5728             </firstname>
5729             <surname>
5730               Helm
5731             </surname>
5732           </personname>
5733         </author>
5734         <author>
5735           <personname>
5736             <firstname>
5737               R.
5738             </firstname>
5739             <surname>
5740               Johnson
5741             </surname>
5742           </personname>
5743         </author>
5744         <author>
5745           <personname>
5746             <firstname>
5747               J.
5748             </firstname>
5749             <surname>
5750               Vlissides
5751             </surname>
5752           </personname>
5753         </author>
5754       </authorgroup>
5755       <publisher>
5756         <publishername>
5757           Addison-Wesley Publishing Company
5758         </publishername>
5759       </publisher>
5760     </biblioentry>
5761
5762
5763     <!-- 18 -->
5764     <biblioentry xml:id="biblio.garg86order">
5765       <title>
5766         Order-preserving key transformations
5767       </title>
5768       <date>
5769         1986
5770       </date>
5771       <authorgroup>
5772         <author>
5773           <personname>
5774             <firstname>
5775               A. K.
5776             </firstname>
5777             <surname>
5778               Garg
5779             </surname>
5780           </personname>
5781         </author>
5782         <author>
5783           <personname>
5784             <firstname>
5785               C. C.
5786             </firstname>
5787             <surname>
5788               Gotlieb
5789             </surname>
5790           </personname>
5791         </author>
5792       </authorgroup>
5793
5794       <publisher>
5795         <publishername>
5796           Trans. Database Syst. 11
5797         </publishername>
5798       </publisher>
5799     </biblioentry>
5800
5801     <!-- 19 -->
5802     <biblioentry xml:id="biblio.hyslop02making">
5803       <title>
5804         Making a real hash of things
5805       </title>
5806       <date>
5807         May 2002
5808       </date>
5809       <authorgroup>
5810         <author>
5811           <personname>
5812             <firstname>
5813               J.
5814             </firstname>
5815             <surname>
5816               Hyslop
5817             </surname>
5818           </personname>
5819         </author>
5820         <author>
5821           <personname>
5822             <firstname>
5823               Herb
5824             </firstname>
5825             <surname>
5826               Sutter
5827             </surname>
5828           </personname>
5829         </author>
5830       </authorgroup>
5831
5832       <publisher>
5833         <publishername>
5834           C++ Report
5835         </publishername>
5836       </publisher>
5837     </biblioentry>
5838
5839
5840     <!-- 20 -->
5841     <biblioentry xml:id="biblio.jossutis01stl">
5842       <title>
5843         The C++ Standard Library - A Tutorial and Reference
5844       </title>
5845       <date>
5846         2001
5847       </date>
5848
5849       <author>
5850         <personname>
5851           <firstname>
5852             N. M.
5853           </firstname>
5854           <surname>
5855             Jossutis
5856           </surname>
5857         </personname>
5858       </author>
5859       <publisher>
5860         <publishername>
5861           Addison-Wesley Publishing Company
5862         </publishername>
5863       </publisher>
5864     </biblioentry>
5865
5866     <!-- 21 -->
5867     <biblioentry xml:id="biblio.kt99fat_heaps">
5868       <title>
5869         <link xmlns:xlink="http://www.w3.org/1999/xlink"
5870               xlink:href="http://www.cs.princeton.edu/research/techreps/TR-597-99">
5871           New Heap Data Structures
5872         </link>
5873       </title>
5874       <date>
5875         1999
5876       </date>
5877
5878       <authorgroup>
5879         <author>
5880           <personname>
5881             <firstname>
5882               Haim
5883             </firstname>
5884             <surname>
5885               Kaplan
5886             </surname>
5887           </personname>
5888         </author>
5889         <author>
5890           <personname>
5891             <firstname>
5892               Robert E.
5893             </firstname>
5894             <surname>
5895               Tarjan
5896             </surname>
5897           </personname>
5898         </author>
5899       </authorgroup>
5900     </biblioentry>
5901
5902
5903     <!-- 22 -->
5904     <biblioentry xml:id="biblio.kleft00sets">
5905       <title>
5906         Are Set Iterators Mutable or Immutable?
5907       </title>
5908       <date>
5909         October 2000
5910       </date>
5911       <authorgroup>
5912         <author>
5913           <personname>
5914             <firstname>
5915               Angelika
5916             </firstname>
5917             <surname>
5918               Langer
5919             </surname>
5920           </personname>
5921         </author>
5922
5923         <author>
5924           <personname>
5925             <firstname>
5926               Klaus
5927             </firstname>
5928             <surname>
5929               Kleft
5930             </surname>
5931           </personname>
5932         </author>
5933       </authorgroup>
5934
5935       <publisher>
5936         <publishername>
5937           C/C++ Users Jornal
5938         </publishername>
5939       </publisher>
5940     </biblioentry>
5941
5942     <!-- 23 -->
5943     <biblioentry xml:id="biblio.knuth98sorting">
5944       <title>
5945         The Art of Computer Programming - Sorting and Searching
5946       </title>
5947       <date>
5948         1998
5949       </date>
5950
5951       <author>
5952         <personname>
5953           <firstname>
5954             D. E.
5955           </firstname>
5956           <surname>
5957             Knuth
5958           </surname>
5959         </personname>
5960       </author>
5961
5962       <publisher>
5963         <publishername>
5964           Addison-Wesley Publishing Company
5965         </publishername>
5966       </publisher>
5967     </biblioentry>
5968
5969     <!-- 24 -->
5970     <biblioentry xml:id="biblio.liskov98data">
5971       <title>
5972         Data abstraction and hierarchy
5973       </title>
5974       <date>
5975         May 1998
5976       </date>
5977
5978       <author>
5979         <personname>
5980           <firstname>
5981             B.
5982           </firstname>
5983           <surname>
5984             Liskov
5985           </surname>
5986         </personname>
5987       </author>
5988
5989       <publisher>
5990         <publishername>
5991           SIGPLAN Notices 23
5992         </publishername>
5993       </publisher>
5994     </biblioentry>
5995
5996     <!-- 25 -->
5997     <biblioentry xml:id="biblio.litwin80lh">
5998       <title>
5999         Linear hashing: A new tool for file and table addressing
6000       </title>
6001       <date>
6002         June 1980
6003       </date>
6004
6005       <author>
6006         <personname>
6007           <firstname>
6008             W.
6009           </firstname>
6010           <surname>
6011             Litwin
6012           </surname>
6013         </personname>
6014       </author>
6015
6016       <publisher>
6017         <publishername>
6018           Proceedings of International Conference on Very Large Data Bases
6019         </publishername>
6020       </publisher>
6021     </biblioentry>
6022
6023     <!-- 26 -->
6024     <biblioentry xml:id="biblio.maverik_lowerbounds">
6025       <title>
6026         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6027               xlink:href="http://magic.aladdin.cs.cmu.edu/2005/08/01/deamortization-part-2-binomial-heaps">
6028           Deamortization - Part 2: Binomial Heaps
6029         </link>
6030       </title>
6031       <date>
6032         2005
6033       </date>
6034
6035       <author>
6036         <personname>
6037           <firstname>
6038             Maverik
6039           </firstname>
6040           <surname>
6041             Woo
6042           </surname>
6043         </personname>
6044       </author>
6045     </biblioentry>
6046
6047     <!-- 27 -->
6048     <biblioentry xml:id="biblio.meyers96more">
6049       <title>
6050         More Effective C++: 35 New Ways to Improve Your Programs and Designs
6051       </title>
6052       <date>
6053         1996
6054       </date>
6055
6056       <author>
6057         <personname>
6058           <firstname>
6059             Scott
6060           </firstname>
6061           <surname>
6062             Meyers
6063           </surname>
6064         </personname>
6065       </author>
6066
6067       <publisher>
6068         <publishername>
6069           Addison-Wesley Publishing Company
6070         </publishername>
6071       </publisher>
6072     </biblioentry>
6073
6074     <!-- 28 -->
6075     <biblioentry xml:id="biblio.meyers00nonmember">
6076       <title>
6077         How Non-Member Functions Improve Encapsulation
6078       </title>
6079       <date>
6080         2000
6081       </date>
6082
6083       <author>
6084         <personname>
6085           <firstname>
6086             Scott
6087           </firstname>
6088           <surname>
6089             Meyers
6090           </surname>
6091         </personname>
6092       </author>
6093
6094       <publisher>
6095         <publishername>
6096           C/C++ Users Journal
6097         </publishername>
6098       </publisher>
6099     </biblioentry>
6100
6101     <!-- 29 -->
6102     <biblioentry xml:id="biblio.meyers01stl">
6103       <title>
6104         Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library
6105       </title>
6106       <date>
6107         2001
6108       </date>
6109
6110       <author>
6111         <personname>
6112           <firstname>
6113             Scott
6114           </firstname>
6115           <surname>
6116             Meyers
6117           </surname>
6118         </personname>
6119       </author>
6120
6121       <publisher>
6122         <publishername>
6123           Addison-Wesley Publishing Company
6124         </publishername>
6125       </publisher>
6126     </biblioentry>
6127
6128     <!-- 30 -->
6129     <biblioentry xml:id="biblio.meyers02both">
6130       <title>
6131         Class Template, Member Template - or Both?
6132       </title>
6133       <date>
6134         2003
6135       </date>
6136
6137       <author>
6138         <personname>
6139           <firstname>
6140             Scott
6141           </firstname>
6142           <surname>
6143             Meyers
6144           </surname>
6145         </personname>
6146       </author>
6147
6148       <publisher>
6149         <publishername>
6150           C/C++ Users Journal
6151         </publishername>
6152       </publisher>
6153     </biblioentry>
6154
6155     <!-- 31 -->
6156     <biblioentry xml:id="biblio.motwani95random">
6157       <title>
6158         Randomized Algorithms
6159       </title>
6160       <date>
6161         2003
6162       </date>
6163       <authorgroup>
6164         <author>
6165           <personname>
6166             <firstname>
6167               R.
6168             </firstname>
6169             <surname>
6170               Motwani
6171             </surname>
6172           </personname>
6173         </author>
6174         <author>
6175           <personname>
6176             <firstname>
6177               P.
6178             </firstname>
6179             <surname>
6180               Raghavan
6181             </surname>
6182           </personname>
6183         </author>
6184       </authorgroup>
6185       <publisher>
6186         <publishername>
6187           Cambridge University Press
6188         </publishername>
6189       </publisher>
6190     </biblioentry>
6191
6192
6193     <!-- 32 -->
6194     <biblioentry xml:id="biblio.mscom">
6195       <title>
6196         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6197               xlink:href="http://www.microsoft.com/com">
6198           COM: Component Model Object Technologies
6199         </link>
6200       </title>
6201       <publisher>
6202         <publishername>
6203           Microsoft
6204         </publishername>
6205       </publisher>
6206     </biblioentry>
6207
6208     <!-- 33 -->
6209     <biblioentry xml:id="biblio.musser95rationale">
6210       <title>
6211         Rationale for Adding Hash Tables to the C++ Standard Template Library
6212       </title>
6213       <date>
6214         1995
6215       </date>
6216
6217       <author>
6218         <personname>
6219           <firstname>
6220             David R.
6221           </firstname>
6222           <surname>
6223             Musser
6224           </surname>
6225         </personname>
6226       </author>
6227
6228     </biblioentry>
6229
6230     <!-- 35 -->
6231     <biblioentry xml:id="biblio.musser96stltutorial">
6232       <title>
6233         STL Tutorial and Reference Guide
6234       </title>
6235       <date>
6236         1996
6237       </date>
6238
6239       <authorgroup>
6240         <author>
6241           <personname>
6242             <firstname>
6243               David R.
6244             </firstname>
6245             <surname>
6246               Musser
6247             </surname>
6248           </personname>
6249         </author>
6250         <author>
6251           <personname>
6252             <firstname>
6253               A.
6254             </firstname>
6255             <surname>
6256               Saini
6257             </surname>
6258           </personname>
6259         </author>
6260       </authorgroup>
6261       <publisher>
6262         <publishername>
6263           Addison-Wesley Publishing Company
6264         </publishername>
6265       </publisher>
6266
6267     </biblioentry>
6268
6269
6270     <!-- 36 -->
6271     <biblioentry xml:id="biblio.nelson96stlpq">
6272       <title>
6273         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6274               xlink:href="http://www.dogma.net/markn/articles/pq_stl/priority.htm">Priority Queues and the STL
6275         </link>
6276       </title>
6277       <date>
6278         January 1996
6279       </date>
6280
6281       <author>
6282         <personname>
6283           <firstname>
6284             Mark
6285           </firstname>
6286           <surname>
6287             Nelson
6288           </surname>
6289         </personname>
6290       </author>
6291
6292       <publisher>
6293         <publishername>
6294           Dr. Dobbs Journal
6295         </publishername>
6296       </publisher>
6297     </biblioentry>
6298
6299
6300     <!-- 37 -->
6301     <biblioentry xml:id="biblio.okasaki98mereable">
6302       <title>
6303         Fast mergeable integer maps
6304       </title>
6305       <date>
6306         September 1998
6307       </date>
6308       <authorgroup>
6309         <author>
6310           <personname>
6311             <firstname>
6312               C.
6313             </firstname>
6314             <surname>
6315               Okasaki
6316             </surname>
6317           </personname>
6318         </author>
6319         <author>
6320           <personname>
6321             <firstname>
6322               A.
6323             </firstname>
6324             <surname>
6325               Gill
6326             </surname>
6327           </personname>
6328         </author>
6329       </authorgroup>
6330       <publisher>
6331         <publishername>
6332           In Workshop on ML
6333         </publishername>
6334       </publisher>
6335     </biblioentry>
6336
6337     <!-- 38 -->
6338     <biblioentry xml:id="biblio.sgi_stl">
6339       <title>
6340         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6341               xlink:href="http://www.sgi.com/tech/stl">
6342           Standard Template Library Programmer's Guide
6343         </link>
6344       </title>
6345       <author>
6346         <personname>
6347           <firstname>
6348             Matt
6349           </firstname>
6350           <surname>
6351             Austern
6352           </surname>
6353         </personname>
6354       </author>
6355
6356       <publisher>
6357         <publishername>
6358           SGI
6359         </publishername>
6360       </publisher>
6361     </biblioentry>
6362
6363     <!-- 39 -->
6364     <biblioentry xml:id="biblio.select_man">
6365       <title>
6366         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6367               xlink:href="http://www.scit.wlv.ac.uk/cgi-bin/mansec?3C+select">
6368           select
6369         </link>
6370       </title>
6371     </biblioentry>
6372
6373
6374     <!-- 40 -->
6375     <biblioentry xml:id="biblio.sleator84amortized">
6376       <title>
6377         Amortized Efficiency of List Update Problems
6378       </title>
6379       <date>
6380         1984
6381       </date>
6382       <authorgroup>
6383         <author>
6384           <personname>
6385             <firstname>
6386               D. D.
6387             </firstname>
6388             <surname>
6389               Sleator
6390             </surname>
6391           </personname>
6392         </author>
6393
6394         <author>
6395           <personname>
6396             <firstname>
6397               R. E.
6398             </firstname>
6399             <surname>
6400               Tarjan
6401             </surname>
6402           </personname>
6403         </author>
6404       </authorgroup>
6405
6406       <publisher>
6407         <publishername>
6408           ACM Symposium on Theory of Computing
6409         </publishername>
6410       </publisher>
6411     </biblioentry>
6412
6413     <!-- 41 -->
6414     <biblioentry xml:id="biblio.sleator85self">
6415       <title>
6416         Self-Adjusting Binary Search Trees
6417       </title>
6418       <date>
6419         1985
6420       </date>
6421
6422       <authorgroup>
6423         <author>
6424           <personname>
6425             <firstname>
6426               D. D.
6427             </firstname>
6428             <surname>
6429               Sleator
6430             </surname>
6431           </personname>
6432         </author>
6433
6434         <author>
6435           <personname>
6436             <firstname>
6437               R. E.
6438             </firstname>
6439             <surname>
6440               Tarjan
6441             </surname>
6442           </personname>
6443         </author>
6444       </authorgroup>
6445
6446       <publisher>
6447         <publishername>
6448           ACM Symposium on Theory of Computing
6449         </publishername>
6450       </publisher>
6451     </biblioentry>
6452
6453     <!-- 42 -->
6454     <biblioentry xml:id="biblio.stepanov94standard">
6455       <title>
6456         The Standard Template Library
6457       </title>
6458       <date>
6459         1984
6460       </date>
6461       <authorgroup>
6462         <author>
6463           <personname>
6464             <firstname>
6465               A. A.
6466             </firstname>
6467             <surname>
6468               Stepanov
6469             </surname>
6470           </personname>
6471         </author>
6472         <author>
6473           <personname>
6474             <firstname>
6475               M.
6476             </firstname>
6477             <surname>
6478               Lee
6479             </surname>
6480           </personname>
6481         </author>
6482       </authorgroup>
6483     </biblioentry>
6484
6485     <!-- 43 -->
6486     <biblioentry xml:id="biblio.stroustrup97cpp">
6487       <title>
6488         The C++ Programming Langugage
6489       </title>
6490       <date>
6491         1997
6492       </date>
6493
6494       <author>
6495         <personname>
6496           <firstname>
6497             Bjarne
6498           </firstname>
6499           <surname>
6500             Stroustrup
6501           </surname>
6502         </personname>
6503       </author>
6504
6505       <publisher>
6506         <publishername>
6507           Addison-Wesley Publishing Company
6508         </publishername>
6509       </publisher>
6510     </biblioentry>
6511
6512     <!-- 44 -->
6513     <biblioentry xml:id="biblio.vandevoorde2002cpptemplates">
6514       <title>
6515         C++ Templates: The Complete Guide
6516       </title>
6517       <date>
6518         2002
6519       </date>
6520       <authorgroup>
6521         <author>
6522           <personname>
6523             <firstname>
6524               D.
6525             </firstname>
6526             <surname>
6527               Vandevoorde
6528             </surname>
6529           </personname>
6530         </author>
6531
6532         <author>
6533           <personname>
6534             <firstname>
6535               N. M.
6536             </firstname>
6537             <surname>
6538               Josuttis
6539             </surname>
6540           </personname>
6541         </author>
6542       </authorgroup>
6543       <publisher>
6544         <publishername>
6545           Addison-Wesley Publishing Company
6546         </publishername>
6547       </publisher>
6548     </biblioentry>
6549
6550
6551     <!-- 45 -->
6552     <biblioentry xml:id="biblio.wickland96thirty">
6553       <title>
6554         <link xmlns:xlink="http://www.w3.org/1999/xlink"
6555               xlink:href="http://myweb.wvnet.edu/~gsa00121/books/amongdead30.zip">
6556           Thirty Years Among the Dead
6557         </link>
6558       </title>
6559       <date>
6560         1996
6561       </date>
6562
6563       <author>
6564         <personname>
6565           <firstname>
6566             C. A.
6567           </firstname>
6568           <surname>
6569             Wickland
6570           </surname>
6571         </personname>
6572       </author>
6573
6574       <publisher>
6575         <publishername>
6576           National Psychological Institute
6577         </publishername>
6578       </publisher>
6579     </biblioentry>
6580
6581
6582   </bibliography>
6583
6584 </chapter>