gcc/cp/gxxint.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename g++int.info
   4 @settitle G++ internals
   5 @setchapternewpage odd
   6 @c %**end of header
   7
   8 @node Top, Limitations of g++, (dir), (dir)
   9 @chapter Internal Architecture of the Compiler
  10
  11 This is meant to describe the C++ front-end for gcc in detail.
  12 Questions and comments to Mike Stump @code{<mrs@@cygnus.com>}.
  13
  14 @menu
  15 * Limitations of g++::
  16 * Routines::
  17 * Implementation Specifics::
  18 * Glossary::
  19 * Macros::
  20 * Typical Behavior::
  21 * Coding Conventions::
  22 * Templates::
  23 * Access Control::
  24 * Error Reporting::
  25 * Parser::
  26 * Copying Objects::
  27 * Exception Handling::
  28 * Free Store::
  29 * Mangling::  Function name mangling for C++ and Java
  30 * Concept Index::
  31 @end menu
  32
  33 @node Limitations of g++, Routines, Top, Top
  34 @section Limitations of g++
  35
  36 @itemize @bullet
  37 @item
  38 Limitations on input source code: 240 nesting levels with the parser
  39 stacksize (YYSTACKSIZE) set to 500 (the default), and requires around
  40 16.4k swap space per nesting level.  The parser needs about 2.09 *
  41 number of nesting levels worth of stackspace.
  42
  43 @cindex pushdecl_class_level
  44 @item
  45 I suspect there are other uses of pushdecl_class_level that do not call
  46 set_identifier_type_value in tandem with the call to
  47 pushdecl_class_level.  It would seem to be an omission.
  48
  49 @cindex access checking
  50 @item
  51 Access checking is unimplemented for nested types.
  52
  53 @cindex @code{volatile}
  54 @item
  55 @code{volatile} is not implemented in general.
  56
  57 @end itemize
  58
  59 @node Routines, Implementation Specifics, Limitations of g++, Top
  60 @section Routines
  61
  62 This section describes some of the routines used in the C++ front-end.
  63
  64 @code{build_vtable} and @code{prepare_fresh_vtable} is used only within
  65 the @file{cp-class.c} file, and only in @code{finish_struct} and
  66 @code{modify_vtable_entries}.
  67
  68 @code{build_vtable}, @code{prepare_fresh_vtable}, and
  69 @code{finish_struct} are the only routines that set @code{DECL_VPARENT}.
  70
  71 @code{finish_struct} can steal the virtual function table from parents,
  72 this prohibits related_vslot from working.  When finish_struct steals,
  73 we know that
  74
  75 @example
  76 get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
  77 @end example
  78
  79 @noindent
  80 will get the related binfo.
  81
  82 @code{layout_basetypes} does something with the VIRTUALS.
  83
  84 Supposedly (according to Tiemann) most of the breadth first searching
  85 done, like in @code{get_base_distance} and in @code{get_binfo} was not
  86 because of any design decision.  I have since found out the at least one
  87 part of the compiler needs the notion of depth first binfo searching, I
  88 am going to try and convert the whole thing, it should just work.  The
  89 term left-most refers to the depth first left-most node.  It uses
  90 @code{MAIN_VARIANT == type} as the condition to get left-most, because
  91 the things that have @code{BINFO_OFFSET}s of zero are shared and will
  92 have themselves as their own @code{MAIN_VARIANT}s.  The non-shared right
  93 ones, are copies of the left-most one, hence if it is its own
  94 @code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is
  95 a non-left-most one.
  96
  97 @code{get_base_distance}'s path and distance matters in its use in:
  98
  99 @itemize @bullet
 100 @item
 101 @code{prepare_fresh_vtable} (the code is probably wrong)
 102 @item
 103 @code{init_vfields} Depends upon distance probably in a safe way,
 104 build_offset_ref might use partial paths to do further lookups,
 105 hack_identifier is probably not properly checking access.
 106
 107 @item
 108 @code{get_first_matching_virtual} probably should check for
 109 @code{get_base_distance} returning -2.
 110
 111 @item
 112 @code{resolve_offset_ref} should be called in a more deterministic
 113 manner.  Right now, it is called in some random contexts, like for
 114 arguments at @code{build_method_call} time, @code{default_conversion}
 115 time, @code{convert_arguments} time, @code{build_unary_op} time,
 116 @code{build_c_cast} time, @code{build_modify_expr} time,
 117 @code{convert_for_assignment} time, and
 118 @code{convert_for_initialization} time.
 119
 120 But, there are still more contexts it needs to be called in, one was the
 121 ever simple:
 122
 123 @example
 124 if (obj.*pmi != 7)
 125    @dots{}
 126 @end example
 127
 128 Seems that the problems were due to the fact that @code{TREE_TYPE} of
 129 the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type
 130 of the referent (like @code{INTEGER_TYPE}).  This problem was fixed by
 131 changing @code{default_conversion} to check @code{TREE_CODE (x)},
 132 instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it
 133 was @code{OFFSET_TYPE}.
 134
 135 @end itemize
 136
 137 @node Implementation Specifics, Glossary, Routines, Top
 138 @section Implementation Specifics
 139
 140 @itemize @bullet
 141 @item Explicit Initialization
 142
 143 The global list @code{current_member_init_list} contains the list of
 144 mem-initializers specified in a constructor declaration.  For example:
 145
 146 @example
 147 foo::foo() : a(1), b(2) @{@}
 148 @end example
 149
 150 @noindent
 151 will initialize @samp{a} with 1 and @samp{b} with 2.
 152 @code{expand_member_init} places each initialization (a with 1) on the
 153 global list.  Then, when the fndecl is being processed,
 154 @code{emit_base_init} runs down the list, initializing them.  It used to
 155 be the case that g++ first ran down @code{current_member_init_list},
 156 then ran down the list of members initializing the ones that weren't
 157 explicitly initialized.  Things were rewritten to perform the
 158 initializations in order of declaration in the class.  So, for the above
 159 example, @samp{a} and @samp{b} will be initialized in the order that
 160 they were declared:
 161
 162 @example
 163 class foo @{ public: int b; int a; foo (); @};
 164 @end example
 165
 166 @noindent
 167 Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be
 168 initialized with 1, regardless of how they're listed in the mem-initializer.
 169
 170 @item Argument Matching
 171
 172 In early 1993, the argument matching scheme in @sc{gnu} C++ changed
 173 significantly.  The original code was completely replaced with a new
 174 method that will, hopefully, be easier to understand and make fixing
 175 specific cases much easier.
 176
 177 The @samp{-fansi-overloading} option is used to enable the new code; at
 178 some point in the future, it will become the default behavior of the
 179 compiler.
 180
 181 The file @file{cp-call.c} contains all of the new work, in the functions
 182 @code{rank_for_overload}, @code{compute_harshness},
 183 @code{compute_conversion_costs}, and @code{ideal_candidate}.
 184
 185 Instead of using obscure numerical values, the quality of an argument
 186 match is now represented by clear, individual codes.  The new data
 187 structure @code{struct harshness} (it used to be an @code{unsigned}
 188 number) contains:
 189
 190 @enumerate a
 191 @item the @samp{code} field, to signify what was involved in matching two
 192 arguments;
 193 @item the @samp{distance} field, used in situations where inheritance
 194 decides which function should be called (one is ``closer'' than
 195 another);
 196 @item and the @samp{int_penalty} field, used by some codes as a tie-breaker.
 197 @end enumerate
 198
 199 The @samp{code} field is a number with a given bit set for each type of
 200 code, OR'd together.  The new codes are:
 201
 202 @itemize @bullet
 203 @item @code{EVIL_CODE}
 204 The argument was not a permissible match.
 205
 206 @item @code{CONST_CODE}
 207 Currently, this is only used by @code{compute_conversion_costs}, to
 208 distinguish when a non-@code{const} member function is called from a
 209 @code{const} member function.
 210
 211 @item @code{ELLIPSIS_CODE}
 212 A match against an ellipsis @samp{...} is considered worse than all others.
 213
 214 @item @code{USER_CODE}
 215 Used for a match involving a user-defined conversion.
 216
 217 @item @code{STD_CODE}
 218 A match involving a standard conversion.
 219
 220 @item @code{PROMO_CODE}
 221 A match involving an integral promotion.  For these, the
 222 @code{int_penalty} field is used to handle the ARM's rule (XXX cite)
 223 that a smaller @code{unsigned} type should promote to a @code{int}, not
 224 to an @code{unsigned int}.
 225
 226 @item @code{QUAL_CODE}
 227 Used to mark use of qualifiers like @code{const} and @code{volatile}.
 228
 229 @item @code{TRIVIAL_CODE}
 230 Used for trivial conversions.  The @samp{int_penalty} field is used by
 231 @code{convert_harshness} to communicate further penalty information back
 232 to @code{build_overload_call_real} when deciding which function should
 233 be call.
 234 @end itemize
 235
 236 The functions @code{convert_to_aggr} and @code{build_method_call} use
 237 @code{compute_conversion_costs} to rate each argument's suitability for
 238 a given candidate function (that's how we get the list of candidates for
 239 @code{ideal_candidate}).
 240
 241 @item The Explicit Keyword
 242
 243 The use of @code{explicit} on a constructor is used by @code{grokdeclarator}
 244 to set the field @code{DECL_NONCONVERTING_P}.  That value is used by
 245 @code{build_method_call} and @code{build_user_type_conversion_1} to decide
 246 if a particular constructor should be used as a candidate for conversions.
 247
 248 @end itemize
 249
 250 @node Glossary, Macros, Implementation Specifics, Top
 251 @section Glossary
 252
 253 @table @r
 254 @item binfo
 255 The main data structure in the compiler used to represent the
 256 inheritance relationships between classes.  The data in the binfo can be
 257 accessed by the BINFO_ accessor macros.
 258
 259 @item vtable
 260 @itemx virtual function table
 261
 262 The virtual function table holds information used in virtual function
 263 dispatching.  In the compiler, they are usually referred to as vtables,
 264 or vtbls.  The first index is not used in the normal way, I believe it
 265 is probably used for the virtual destructor.
 266
 267 @item vfield
 268
 269 vfields can be thought of as the base information needed to build
 270 vtables.  For every vtable that exists for a class, there is a vfield.
 271 See also vtable and virtual function table pointer.  When a type is used
 272 as a base class to another type, the virtual function table for the
 273 derived class can be based upon the vtable for the base class, just
 274 extended to include the additional virtual methods declared in the
 275 derived class.  The virtual function table from a virtual base class is
 276 never reused in a derived class.  @code{is_normal} depends upon this.
 277
 278 @item virtual function table pointer
 279
 280 These are @code{FIELD_DECL}s that are pointer types that point to
 281 vtables.  See also vtable and vfield.
 282 @end table
 283
 284 @node Macros, Typical Behavior, Glossary, Top
 285 @section Macros
 286
 287 This section describes some of the macros used on trees.  The list
 288 should be alphabetical.  Eventually all macros should be documented
 289 here.
 290
 291 @table @code
 292 @item BINFO_BASETYPES
 293 A vector of additional binfos for the types inherited by this basetype.
 294 The binfos are fully unshared (except for virtual bases, in which
 295 case the binfo structure is shared).
 296
 297    If this basetype describes type D as inherited in C,
 298    and if the basetypes of D are E anf F,
 299    then this vector contains binfos for inheritance of E and F by C.
 300
 301 Has values of:
 302
 303         TREE_VECs
 304
 305
 306 @item BINFO_INHERITANCE_CHAIN
 307 Temporarily used to represent specific inheritances.  It usually points
 308 to the binfo associated with the lesser derived type, but it can be
 309 reversed by reverse_path.  For example:
 310
 311 @example
 312         Z ZbY   least derived
 313         |
 314         Y YbX
 315         |
 316         X Xb    most derived
 317
 318 TYPE_BINFO (X) == Xb
 319 BINFO_INHERITANCE_CHAIN (Xb) == YbX
 320 BINFO_INHERITANCE_CHAIN (Yb) == ZbY
 321 BINFO_INHERITANCE_CHAIN (Zb) == 0
 322 @end example
 323
 324 Not sure is the above is really true, get_base_distance has is point
 325 towards the most derived type, opposite from above.
 326
 327 Set by build_vbase_path, recursive_bounded_basetype_p,
 328 get_base_distance, lookup_field, lookup_fnfields, and reverse_path.
 329
 330 What things can this be used on:
 331
 332         TREE_VECs that are binfos
 333
 334
 335 @item BINFO_OFFSET
 336 The offset where this basetype appears in its containing type.
 337 BINFO_OFFSET slot holds the offset (in bytes) from the base of the
 338 complete object to the base of the part of the object that is allocated
 339 on behalf of this `type'.  This is always 0 except when there is
 340 multiple inheritance.
 341
 342 Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example.
 343
 344
 345 @item BINFO_VIRTUALS
 346 A unique list of functions for the virtual function table.  See also
 347 TYPE_BINFO_VIRTUALS.
 348
 349 What things can this be used on:
 350
 351         TREE_VECs that are binfos
 352
 353
 354 @item BINFO_VTABLE
 355 Used to find the VAR_DECL that is the virtual function table associated
 356 with this binfo.  See also TYPE_BINFO_VTABLE.  To get the virtual
 357 function table pointer, see CLASSTYPE_VFIELD.
 358
 359 What things can this be used on:
 360
 361         TREE_VECs that are binfos
 362
 363 Has values of:
 364
 365         VAR_DECLs that are virtual function tables
 366
 367
 368 @item BLOCK_SUPERCONTEXT
 369 In the outermost scope of each function, it points to the FUNCTION_DECL
 370 node.  It aids in better DWARF support of inline functions.
 371
 372
 373 @item CLASSTYPE_TAGS
 374 CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a
 375 class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans
 376 these and calls pushtag on them.)
 377
 378 finish_struct scans these to produce TYPE_DECLs to add to the
 379 TYPE_FIELDS of the type.
 380
 381 It is expected that name found in the TREE_PURPOSE slot is unique,
 382 resolve_scope_to_name is one such place that depends upon this
 383 uniqueness.
 384
 385
 386 @item CLASSTYPE_METHOD_VEC
 387 The following is true after finish_struct has been called (on the
 388 class?) but not before.  Before finish_struct is called, things are
 389 different to some extent.  Contains a TREE_VEC of methods of the class.
 390 The TREE_VEC_LENGTH is the number of differently named methods plus one
 391 for the 0th entry.  The 0th entry is always allocated, and reserved for
 392 ctors and dtors.  If there are none, TREE_VEC_ELT(N,0) == NULL_TREE.
 393 Each entry of the TREE_VEC is a FUNCTION_DECL.  For each FUNCTION_DECL,
 394 there is a DECL_CHAIN slot.  If the FUNCTION_DECL is the last one with a
 395 given name, the DECL_CHAIN slot is NULL_TREE.  Otherwise it is the next
 396 method that has the same name (but a different signature).  It would
 397 seem that it is not true that because the DECL_CHAIN slot is used in
 398 this way, we cannot call pushdecl to put the method in the global scope
 399 (cause that would overwrite the TREE_CHAIN slot), because they use
 400 different _CHAINs.  finish_struct_methods setups up one version of the
 401 TREE_CHAIN slots on the FUNCTION_DECLs.
 402
 403 friends are kept in TREE_LISTs, so that there's no need to use their
 404 TREE_CHAIN slot for anything.
 405
 406 Has values of:
 407
 408         TREE_VECs
 409
 410
 411 @item CLASSTYPE_VFIELD
 412 Seems to be in the process of being renamed TYPE_VFIELD.  Use on types
 413 to get the main virtual function table pointer.  To get the virtual
 414 function table use BINFO_VTABLE (TYPE_BINFO ()).
 415
 416 Has values of:
 417
 418         FIELD_DECLs that are virtual function table pointers
 419
 420 What things can this be used on:
 421
 422         RECORD_TYPEs
 423
 424
 425 @item DECL_CLASS_CONTEXT
 426 Identifies the context that the _DECL was found in.  For virtual function
 427 tables, it points to the type associated with the virtual function
 428 table.  See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT.
 429
 430 The difference between this and DECL_CONTEXT, is that for virtuals
 431 functions like:
 432
 433 @example
 434 struct A
 435 @{
 436   virtual int f ();
 437 @};
 438
 439 struct B : A
 440 @{
 441   int f ();
 442 @};
 443
 444 DECL_CONTEXT (A::f) == A
 445 DECL_CLASS_CONTEXT (A::f) == A
 446
 447 DECL_CONTEXT (B::f) == A
 448 DECL_CLASS_CONTEXT (B::f) == B
 449 @end example
 450
 451 Has values of:
 452
 453         RECORD_TYPEs, or UNION_TYPEs
 454
 455 What things can this be used on:
 456
 457         TYPE_DECLs, _DECLs
 458
 459
 460 @item DECL_CONTEXT
 461 Identifies the context that the _DECL was found in.  Can be used on
 462 virtual function tables to find the type associated with the virtual
 463 function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a
 464 better access method.  Internally the same as DECL_FIELD_CONTEXT, so
 465 don't us both.  See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and
 466 DECL_CLASS_CONTEXT.
 467
 468 Has values of:
 469
 470         RECORD_TYPEs
 471
 472
 473 What things can this be used on:
 474
 475 @display
 476 VAR_DECLs that are virtual function tables
 477 _DECLs
 478 @end display
 479
 480
 481 @item DECL_FIELD_CONTEXT
 482 Identifies the context that the FIELD_DECL was found in.  Internally the
 483 same as DECL_CONTEXT, so don't us both.  See also DECL_CONTEXT,
 484 DECL_FCONTEXT and DECL_CLASS_CONTEXT.
 485
 486 Has values of:
 487
 488         RECORD_TYPEs
 489
 490 What things can this be used on:
 491
 492 @display
 493 FIELD_DECLs that are virtual function pointers
 494 FIELD_DECLs
 495 @end display
 496
 497
 498 @item DECL_NAME
 499
 500 Has values of:
 501
 502 @display
 503 0 for things that don't have names
 504 IDENTIFIER_NODEs for TYPE_DECLs
 505 @end display
 506
 507 @item DECL_IGNORED_P
 508 A bit that can be set to inform the debug information output routines in
 509 the back-end that a certain _DECL node should be totally ignored.
 510
 511 Used in cases where it is known that the debugging information will be
 512 output in another file, or where a sub-type is known not to be needed
 513 because the enclosing type is not needed.
 514
 515 A compiler constructed virtual destructor in derived classes that do not
 516 define an explicit destructor that was defined explicit in a base class
 517 has this bit set as well.  Also used on __FUNCTION__ and
 518 __PRETTY_FUNCTION__ to mark they are ``compiler generated.''  c-decl and
 519 c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,''
 520 and ``user-invisible variable.''
 521
 522 Functions built by the C++ front-end such as default destructors,
 523 virtual destructors and default constructors want to be marked that
 524 they are compiler generated, but unsure why.
 525
 526 Currently, it is used in an absolute way in the C++ front-end, as an
 527 optimization, to tell the debug information output routines to not
 528 generate debugging information that will be output by another separately
 529 compiled file.
 530
 531
 532 @item DECL_VIRTUAL_P
 533 A flag used on FIELD_DECLs and VAR_DECLs.  (Documentation in tree.h is
 534 wrong.)  Used in VAR_DECLs to indicate that the variable is a vtable.
 535 It is also used in FIELD_DECLs for vtable pointers.
 536
 537 What things can this be used on:
 538
 539         FIELD_DECLs and VAR_DECLs
 540
 541
 542 @item DECL_VPARENT
 543 Used to point to the parent type of the vtable if there is one, else it
 544 is just the type associated with the vtable.  Because of the sharing of
 545 virtual function tables that goes on, this slot is not very useful, and
 546 is in fact, not used in the compiler at all.  It can be removed.
 547
 548 What things can this be used on:
 549
 550         VAR_DECLs that are virtual function tables
 551
 552 Has values of:
 553
 554         RECORD_TYPEs maybe UNION_TYPEs
 555
 556
 557 @item DECL_FCONTEXT
 558 Used to find the first baseclass in which this FIELD_DECL is defined.
 559 See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT.
 560
 561 How it is used:
 562
 563         Used when writing out debugging information about vfield and
 564         vbase decls.
 565
 566 What things can this be used on:
 567
 568         FIELD_DECLs that are virtual function pointers
 569         FIELD_DECLs
 570
 571
 572 @item DECL_REFERENCE_SLOT
 573 Used to hold the initialize for the reference.
 574
 575 What things can this be used on:
 576
 577         PARM_DECLs and VAR_DECLs that have a reference type
 578
 579
 580 @item DECL_VINDEX
 581 Used for FUNCTION_DECLs in two different ways.  Before the structure
 582 containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a
 583 FUNCTION_DECL in a base class which is the FUNCTION_DECL which this
 584 FUNCTION_DECL will replace as a virtual function.  When the class is
 585 laid out, this pointer is changed to an INTEGER_CST node which is
 586 suitable to find an index into the virtual function table.  See
 587 get_vtable_entry as to how one can find the right index into the virtual
 588 function table.  The first index 0, of a virtual function table it not
 589 used in the normal way, so the first real index is 1.
 590
 591 DECL_VINDEX may be a TREE_LIST, that would seem to be a list of
 592 overridden FUNCTION_DECLs.  add_virtual_function has code to deal with
 593 this when it uses the variable base_fndecl_list, but it would seem that
 594 somehow, it is possible for the TREE_LIST to pursist until method_call,
 595 and it should not.
 596
 597
 598 What things can this be used on:
 599
 600         FUNCTION_DECLs
 601
 602
 603 @item DECL_SOURCE_FILE
 604 Identifies what source file a particular declaration was found in.
 605
 606 Has values of:
 607
 608         "<built-in>" on TYPE_DECLs to mean the typedef is built in
 609
 610
 611 @item DECL_SOURCE_LINE
 612 Identifies what source line number in the source file the declaration
 613 was found at.
 614
 615 Has values of:
 616
 617 @display
 618 0 for an undefined label
 619
 620 0 for TYPE_DECLs that are internally generated
 621
 622 0 for FUNCTION_DECLs for functions generated by the compiler
 623         (not yet, but should be)
 624
 625 0 for ``magic'' arguments to functions, that the user has no
 626         control over
 627 @end display
 628
 629
 630 @item TREE_USED
 631
 632 Has values of:
 633
 634         0 for unused labels
 635
 636
 637 @item TREE_ADDRESSABLE
 638 A flag that is set for any type that has a constructor.
 639
 640
 641 @item TREE_COMPLEXITY
 642 They seem a kludge way to track recursion, poping, and pushing.  They only
 643 appear in cp-decl.c and cp-decl2.c, so the are a good candidate for
 644 proper fixing, and removal.
 645
 646
 647 @item TREE_HAS_CONSTRUCTOR
 648 A flag to indicate when a CALL_EXPR represents a call to a constructor.
 649 If set, we know that the type of the object, is the complete type of the
 650 object, and that the value returned is nonnull.  When used in this
 651 fashion, it is an optimization.  Can also be used on SAVE_EXPRs to
 652 indicate when they are of fixed type and nonnull.  Can also be used on
 653 INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor.
 654
 655
 656 @item TREE_PRIVATE
 657 Set for FIELD_DECLs by finish_struct.  But not uniformly set.
 658
 659 The following routines do something with PRIVATE access:
 660 build_method_call, alter_access, finish_struct_methods,
 661 finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType,
 662 CWriteUseObject, compute_access, lookup_field, dfs_pushdecl,
 663 GNU_xref_member, dbxout_type_fields, dbxout_type_method_1
 664
 665
 666 @item TREE_PROTECTED
 667 The following routines do something with PROTECTED access:
 668 build_method_call, alter_access, finish_struct, convert_to_aggr,
 669 CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject,
 670 compute_access, lookup_field, GNU_xref_member, dbxout_type_fields,
 671 dbxout_type_method_1
 672
 673
 674 @item TYPE_BINFO
 675 Used to get the binfo for the type.
 676
 677 Has values of:
 678
 679         TREE_VECs that are binfos
 680
 681 What things can this be used on:
 682
 683         RECORD_TYPEs
 684
 685
 686 @item TYPE_BINFO_BASETYPES
 687 See also BINFO_BASETYPES.
 688
 689 @item TYPE_BINFO_VIRTUALS
 690 A unique list of functions for the virtual function table.  See also
 691 BINFO_VIRTUALS.
 692
 693 What things can this be used on:
 694
 695         RECORD_TYPEs
 696
 697
 698 @item TYPE_BINFO_VTABLE
 699 Points to the virtual function table associated with the given type.
 700 See also BINFO_VTABLE.
 701
 702 What things can this be used on:
 703
 704         RECORD_TYPEs
 705
 706 Has values of:
 707
 708         VAR_DECLs that are virtual function tables
 709
 710
 711 @item TYPE_NAME
 712 Names the type.
 713
 714 Has values of:
 715
 716 @display
 717 0 for things that don't have names.
 718 should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and
 719         ENUM_TYPEs.
 720 TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but
 721         shouldn't be.
 722 TYPE_DECL for typedefs, unsure why.
 723 @end display
 724
 725 What things can one use this on:
 726
 727 @display
 728 TYPE_DECLs
 729 RECORD_TYPEs
 730 UNION_TYPEs
 731 ENUM_TYPEs
 732 @end display
 733
 734 History:
 735
 736         It currently points to the TYPE_DECL for RECORD_TYPEs,
 737         UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
 738
 739
 740 @item TYPE_METHODS
 741 Synonym for @code{CLASSTYPE_METHOD_VEC}.  Chained together with
 742 @code{TREE_CHAIN}.  @file{dbxout.c} uses this to get at the methods of a
 743 class.
 744
 745
 746 @item TYPE_DECL
 747 Used to represent typedefs, and used to represent bindings layers.
 748
 749 Components:
 750
 751         DECL_NAME is the name of the typedef.  For example, foo would
 752         be found in the DECL_NAME slot when @code{typedef int foo;} is
 753         seen.
 754
 755         DECL_SOURCE_LINE identifies what source line number in the
 756         source file the declaration was found at.  A value of 0
 757         indicates that this TYPE_DECL is just an internal binding layer
 758         marker, and does not correspond to a user supplied typedef.
 759
 760         DECL_SOURCE_FILE
 761
 762 @item TYPE_FIELDS
 763 A linked list (via @code{TREE_CHAIN}) of member types of a class.  The
 764 list can contain @code{TYPE_DECL}s, but there can also be other things
 765 in the list apparently.  See also @code{CLASSTYPE_TAGS}.
 766
 767
 768 @item TYPE_VIRTUAL_P
 769 A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is
 770 a virtual function table or a pointer to one.  When used on a
 771 @code{FUNCTION_DECL}, indicates that it is a virtual function.  When
 772 used on an @code{IDENTIFIER_NODE}, indicates that a function with this
 773 same name exists and has been declared virtual.
 774
 775 When used on types, it indicates that the type has virtual functions, or
 776 is derived from one that does.
 777
 778 Not sure if the above about virtual function tables is still true.  See
 779 also info on @code{DECL_VIRTUAL_P}.
 780
 781 What things can this be used on:
 782
 783         FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
 784
 785
 786 @item VF_BASETYPE_VALUE
 787 Get the associated type from the binfo that caused the given vfield to
 788 exist.  This is the least derived class (the most parent class) that
 789 needed a virtual function table.  It is probably the case that all uses
 790 of this field are misguided, but they need to be examined on a
 791 case-by-case basis.  See history for more information on why the
 792 previous statement was made.
 793
 794 Set at @code{finish_base_struct} time.
 795
 796 What things can this be used on:
 797
 798         TREE_LISTs that are vfields
 799
 800 History:
 801
 802         This field was used to determine if a virtual function table's
 803         slot should be filled in with a certain virtual function, by
 804         checking to see if the type returned by VF_BASETYPE_VALUE was a
 805         parent of the context in which the old virtual function existed.
 806         This incorrectly assumes that a given type _could_ not appear as
 807         a parent twice in a given inheritance lattice.  For single
 808         inheritance, this would in fact work, because a type could not
 809         possibly appear more than once in an inheritance lattice, but
 810         with multiple inheritance, a type can appear more than once.
 811
 812
 813 @item VF_BINFO_VALUE
 814 Identifies the binfo that caused this vfield to exist.  If this vfield
 815 is from the first direct base class that has a virtual function table,
 816 then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the
 817 direct base where the vfield came from.  Can use @code{TREE_VIA_VIRTUAL}
 818 on result to find out if it is a virtual base class.  Related to the
 819 binfo found by
 820
 821 @example
 822 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 823 @end example
 824
 825 @noindent
 826 where @samp{t} is the type that has the given vfield.
 827
 828 @example
 829 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 830 @end example
 831
 832 @noindent
 833 will return the binfo for the the given vfield.
 834
 835 May or may not be set at @code{modify_vtable_entries} time.  Set at
 836 @code{finish_base_struct} time.
 837
 838 What things can this be used on:
 839
 840         TREE_LISTs that are vfields
 841
 842
 843 @item VF_DERIVED_VALUE
 844 Identifies the type of the most derived class of the vfield, excluding
 845 the the class this vfield is for.
 846
 847 Set at @code{finish_base_struct} time.
 848
 849 What things can this be used on:
 850
 851         TREE_LISTs that are vfields
 852
 853
 854 @item VF_NORMAL_VALUE
 855 Identifies the type of the most derived class of the vfield, including
 856 the class this vfield is for.
 857
 858 Set at @code{finish_base_struct} time.
 859
 860 What things can this be used on:
 861
 862         TREE_LISTs that are vfields
 863
 864
 865 @item WRITABLE_VTABLES
 866 This is a option that can be defined when building the compiler, that
 867 will cause the compiler to output vtables into the data segment so that
 868 the vtables maybe written.  This is undefined by default, because
 869 normally the vtables should be unwritable.  People that implement object
 870 I/O facilities may, or people that want to change the dynamic type of
 871 objects may want to have the vtables writable.  Another way of achieving
 872 this would be to make a copy of the vtable into writable memory, but the
 873 drawback there is that that method only changes the type for one object.
 874
 875 @end table
 876
 877 @node Typical Behavior, Coding Conventions, Macros, Top
 878 @section Typical Behavior
 879
 880 @cindex parse errors
 881
 882 Whenever seemingly normal code fails with errors like
 883 @code{syntax error at `\@{'}, it's highly likely that grokdeclarator is
 884 returning a NULL_TREE for whatever reason.
 885
 886 @node Coding Conventions, Templates, Typical Behavior, Top
 887 @section Coding Conventions
 888
 889 It should never be that case that trees are modified in-place by the
 890 back-end, @emph{unless} it is guaranteed that the semantics are the same
 891 no matter how shared the tree structure is.  @file{fold-const.c} still
 892 has some cases where this is not true, but rms hypothesizes that this
 893 will never be a problem.
 894
 895 @node Templates, Access Control, Coding Conventions, Top
 896 @section Templates
 897
 898 A template is represented by a @code{TEMPLATE_DECL}.  The specific
 899 fields used are:
 900
 901 @table @code
 902 @item DECL_TEMPLATE_RESULT
 903 The generic decl on which instantiations are based.  This looks just
 904 like any other decl.
 905
 906 @item DECL_TEMPLATE_PARMS
 907 The parameters to this template.
 908 @end table
 909
 910 The generic decl is parsed as much like any other decl as possible,
 911 given the parameterization.  The template decl is not built up until the
 912 generic decl has been completed.  For template classes, a template decl
 913 is generated for each member function and static data member, as well.
 914
 915 Template members of template classes are represented by a TEMPLATE_DECL
 916 for the class' parameters around another TEMPLATE_DECL for the member's
 917 parameters.
 918
 919 All declarations that are instantiations or specializations of templates
 920 refer to their template and parameters through DECL_TEMPLATE_INFO.
 921
 922 How should I handle parsing member functions with the proper param
 923 decls?  Set them up again or try to use the same ones?  Currently we do
 924 the former.  We can probably do this without any extra machinery in
 925 store_pending_inline, by deducing the parameters from the decl in
 926 do_pending_inlines.  PRE_PARSED_TEMPLATE_DECL?
 927
 928 If a base is a parm, we can't check anything about it.  If a base is not
 929 a parm, we need to check it for name binding.  Do finish_base_struct if
 930 no bases are parameterized (only if none, including indirect, are
 931 parms).  Nah, don't bother trying to do any of this until instantiation
 932 -- we only need to do name binding in advance.
 933
 934 Always set up method vec and fields, inc. synthesized methods.  Really?
 935 We can't know the types of the copy folks, or whether we need a
 936 destructor, or can have a default ctor, until we know our bases and
 937 fields.  Otherwise, we can assume and fix ourselves later.  Hopefully.
 938
 939 @node Access Control, Error Reporting, Templates, Top
 940 @section Access Control
 941 The function compute_access returns one of three values:
 942
 943 @table @code
 944 @item access_public
 945 means that the field can be accessed by the current lexical scope.
 946
 947 @item access_protected
 948 means that the field cannot be accessed by the current lexical scope
 949 because it is protected.
 950
 951 @item access_private
 952 means that the field cannot be accessed by the current lexical scope
 953 because it is private.
 954 @end table
 955
 956 DECL_ACCESS is used for access declarations; alter_access creates a list
 957 of types and accesses for a given decl.
 958
 959 Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return
 960 codes of compute_access and were used as a cache for compute_access.
 961 Now they are not used at all.
 962
 963 TREE_PROTECTED and TREE_PRIVATE are used to record the access levels
 964 granted by the containing class.  BEWARE: TREE_PUBLIC means something
 965 completely unrelated to access control!
 966
 967 @node Error Reporting, Parser, Access Control, Top
 968 @section Error Reporting
 969
 970 The C++ front-end uses a call-back mechanism to allow functions to print
 971 out reasonable strings for types and functions without putting extra
 972 logic in the functions where errors are found.  The interface is through
 973 the @code{cp_error} function (or @code{cp_warning}, etc.).  The
 974 syntax is exactly like that of @code{error}, except that a few more
 975 conversions are supported:
 976
 977 @itemize @bullet
 978 @item
 979 %C indicates a value of `enum tree_code'.
 980 @item
 981 %D indicates a *_DECL node.
 982 @item
 983 %E indicates a *_EXPR node.
 984 @item
 985 %L indicates a value of `enum languages'.
 986 @item
 987 %P indicates the name of a parameter (i.e. "this", "1", "2", ...)
 988 @item
 989 %T indicates a *_TYPE node.
 990 @item
 991 %O indicates the name of an operator (MODIFY_EXPR -> "operator =").
 992
 993 @end itemize
 994
 995 There is some overlap between these; for instance, any of the node
 996 options can be used for printing an identifier (though only @code{%D}
 997 tries to decipher function names).
 998
 999 For a more verbose message (@code{class foo} as opposed to just @code{foo},
1000 including the return type for functions), use @code{%#c}.
1001 To have the line number on the error message indicate the line of the
1002 DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want,
1003 use @code{%+D}, or it will default to the first.
1004
1005 @node Parser, Copying Objects, Error Reporting, Top
1006 @section Parser
1007
1008 Some comments on the parser:
1009
1010 The @code{after_type_declarator} / @code{notype_declarator} hack is
1011 necessary in order to allow redeclarations of @code{TYPENAME}s, for
1012 instance
1013
1014 @example
1015 typedef int foo;
1016 class A @{
1017   char *foo;
1018 @};
1019 @end example
1020
1021 In the above, the first @code{foo} is parsed as a @code{notype_declarator},
1022 and the second as a @code{after_type_declarator}.
1023
1024 Ambiguities:
1025
1026 There are currently four reduce/reduce ambiguities in the parser.  They are:
1027
1028 1) Between @code{template_parm} and
1029 @code{named_class_head_sans_basetype}, for the tokens @code{aggr
1030 identifier}.  This situation occurs in code looking like
1031
1032 @example
1033 template <class T> class A @{ @};
1034 @end example
1035
1036 It is ambiguous whether @code{class T} should be parsed as the
1037 declaration of a template type parameter named @code{T} or an unnamed
1038 constant parameter of type @code{class T}.  Section 14.6, paragraph 3 of
1039 the January '94 working paper states that the first interpretation is
1040 the correct one.  This ambiguity results in two reduce/reduce conflicts.
1041
1042 2) Between @code{primary} and @code{type_id} for code like @samp{int()}
1043 in places where both can be accepted, such as the argument to
1044 @code{sizeof}.  Section 8.1 of the pre-San Diego working paper specifies
1045 that these ambiguous constructs will be interpreted as @code{typename}s.
1046 This ambiguity results in six reduce/reduce conflicts between
1047 @samp{absdcl} and @samp{functional_cast}.
1048
1049 3) Between @code{functional_cast} and
1050 @code{complex_direct_notype_declarator}, for various token strings.
1051 This situation occurs in code looking like
1052
1053 @example
1054 int (*a);
1055 @end example
1056
1057 This code is ambiguous; it could be a declaration of the variable
1058 @samp{a} as a pointer to @samp{int}, or it could be a functional cast of
1059 @samp{*a} to @samp{int}.  Section 6.8 specifies that the former
1060 interpretation is correct.  This ambiguity results in 7 reduce/reduce
1061 conflicts.  Another aspect of this ambiguity is code like 'int (x[2]);',
1062 which is resolved at the '[' and accounts for 6 reduce/reduce conflicts
1063 between @samp{direct_notype_declarator} and
1064 @samp{primary}/@samp{overqualified_id}.  Finally, there are 4 r/r
1065 conflicts between @samp{expr_or_declarator} and @samp{primary} over code
1066 like 'int (a);', which could probably be resolved but would also
1067 probably be more trouble than it's worth.  In all, this situation
1068 accounts for 17 conflicts.  Ack!
1069
1070 The second case above is responsible for the failure to parse 'LinppFile
1071 ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave
1072 Math.h++) as an object declaration, and must be fixed so that it does
1073 not resolve until later.
1074
1075 4) Indirectly between @code{after_type_declarator} and @code{parm}, for
1076 type names.  This occurs in (as one example) code like
1077
1078 @example
1079 typedef int foo, bar;
1080 class A @{
1081   foo (bar);
1082 @};
1083 @end example
1084
1085 What is @code{bar} inside the class definition?  We currently interpret
1086 it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an
1087 @code{after_type_declarator}.  I believe that xlC is correct, in light
1088 of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that
1089 could possibly be a type name is taken as the @i{decl-specifier-seq} of
1090 a @i{declaration}."  However, it seems clear that this rule must be
1091 violated in the case of constructors.  This ambiguity accounts for 8
1092 conflicts.
1093
1094 Unlike the others, this ambiguity is not recognized by the Working Paper.
1095
1096 @node  Copying Objects, Exception Handling, Parser, Top
1097 @section Copying Objects
1098
1099 The generated copy assignment operator in g++ does not currently do the
1100 right thing for multiple inheritance involving virtual bases; it just
1101 calls the copy assignment operators for its direct bases.  What it
1102 should probably do is:
1103
1104 1) Split up the copy assignment operator for all classes that have
1105 vbases into "copy my vbases" and "copy everything else" parts.  Or do
1106 the trickiness that the constructors do to ensure that vbases don't get
1107 initialized by intermediate bases.
1108
1109 2) Wander through the class lattice, find all vbases for which no
1110 intermediate base has a user-defined copy assignment operator, and call
1111 their "copy everything else" routines.  If not all of my vbases satisfy
1112 this criterion, warn, because this may be surprising behavior.
1113
1114 3) Call the "copy everything else" routine for my direct bases.
1115
1116 If we only have one direct base, we can just foist everything off onto
1117 them.
1118
1119 This issue is currently under discussion in the core reflector
1120 (2/28/94).
1121
1122 @node  Exception Handling, Free Store, Copying Objects, Top
1123 @section Exception Handling
1124
1125 Note, exception handling in g++ is still under development.
1126
1127 This section describes the mapping of C++ exceptions in the C++
1128 front-end, into the back-end exception handling framework.
1129
1130 The basic mechanism of exception handling in the back-end is
1131 unwind-protect a la elisp.  This is a general, robust, and language
1132 independent representation for exceptions.
1133
1134 The C++ front-end exceptions are mapping into the unwind-protect
1135 semantics by the C++ front-end.  The mapping is describe below.
1136
1137 When -frtti is used, rtti is used to do exception object type checking,
1138 when it isn't used, the encoded name for the type of the object being
1139 thrown is used instead.  All code that originates exceptions, even code
1140 that throws exceptions as a side effect, like dynamic casting, and all
1141 code that catches exceptions must be compiled with either -frtti, or
1142 -fno-rtti.  It is not possible to mix rtti base exception handling
1143 objects with code that doesn't use rtti.  The exceptions to this, are
1144 code that doesn't catch or throw exceptions, catch (...), and code that
1145 just rethrows an exception.
1146
1147 Currently we use the normal mangling used in building functions names
1148 (int's are "i", const char * is PCc) to build the non-rtti base type
1149 descriptors for exception handling.  These descriptors are just plain
1150 NULL terminated strings, and internally they are passed around as char
1151 *.
1152
1153 In C++, all cleanups should be protected by exception regions.  The
1154 region starts just after the reason why the cleanup is created has
1155 ended.  For example, with an automatic variable, that has a constructor,
1156 it would be right after the constructor is run.  The region ends just
1157 before the finalization is expanded.  Since the backend may expand the
1158 cleanup multiple times along different paths, once for normal end of the
1159 region, once for non-local gotos, once for returns, etc, the backend
1160 must take special care to protect the finalization expansion, if the
1161 expansion is for any other reason than normal region end, and it is
1162 `inline' (it is inside the exception region).  The backend can either
1163 choose to move them out of line, or it can created an exception region
1164 over the finalization to protect it, and in the handler associated with
1165 it, it would not run the finalization as it otherwise would have, but
1166 rather just rethrow to the outer handler, careful to skip the normal
1167 handler for the original region.
1168
1169 In Ada, they will use the more runtime intensive approach of having
1170 fewer regions, but at the cost of additional work at run time, to keep a
1171 list of things that need cleanups.  When a variable has finished
1172 construction, they add the cleanup to the list, when the come to the end
1173 of the lifetime of the variable, the run the list down.  If the take a
1174 hit before the section finishes normally, they examine the list for
1175 actions to perform.  I hope they add this logic into the back-end, as it
1176 would be nice to get that alternative approach in C++.
1177
1178 On an rs6000, xlC stores exception objects on that stack, under the try
1179 block.  When is unwinds down into a handler, the frame pointer is
1180 adjusted back to the normal value for the frame in which the handler
1181 resides, and the stack pointer is left unchanged from the time at which
1182 the object was thrown.  This is so that there is always someplace for
1183 the exception object, and nothing can overwrite it, once we start
1184 throwing.  The only bad part, is that the stack remains large.
1185
1186 The below points out some things that work in g++'s exception handling.
1187
1188 All completely constructed temps and local variables are cleaned up in
1189 all unwinded scopes.  Completely constructed parts of partially
1190 constructed objects are cleaned up.  This includes partially built
1191 arrays.  Exception specifications are now handled.  Thrown objects are
1192 now cleaned up all the time.  We can now tell if we have an active
1193 exception being thrown or not (__eh_type != 0).  We use this to call
1194 terminate if someone does a throw; without there being an active
1195 exception object.  uncaught_exception () works.  Exception handling
1196 should work right if you optimize.  Exception handling should work with
1197 -fpic or -fPIC.
1198
1199 The below points out some flaws in g++'s exception handling, as it now
1200 stands.
1201
1202 Only exact type matching or reference matching of throw types works when
1203 -fno-rtti is used.  Only works on a SPARC (like Suns) (both -mflat and
1204 -mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000,
1205 PowerPC, Alpha, mips, VAX, m68k and z8k machines.  SPARC v9 may not
1206 work.  HPPA is mostly done, but throwing between a shared library and
1207 user code doesn't yet work.  Some targets have support for data-driven
1208 unwinding.  Partial support is in for all other machines, but a stack
1209 unwinder called __unwind_function has to be written, and added to
1210 libgcc2 for them.  The new EH code doesn't rely upon the
1211 __unwind_function for C++ code, instead it creates per function
1212 unwinders right inside the function, unfortunately, on many platforms
1213 the definition of RETURN_ADDR_RTX in the tm.h file for the machine port
1214 is wrong.  See below for details on __unwind_function.  RTL_EXPRs for EH
1215 cond variables for && and || exprs should probably be wrapped in
1216 UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved.
1217
1218 We only do pointer conversions on exception matching a la 15.3 p2 case
1219 3: `A handler with type T, const T, T&, or const T& is a match for a
1220 throw-expression with an object of type E if [3]T is a pointer type and
1221 E is a pointer type that can be converted to T by a standard pointer
1222 conversion (_conv.ptr_) not involving conversions to pointers to private
1223 or protected base classes.' when -frtti is given.
1224
1225 We don't call delete on new expressions that die because the ctor threw
1226 an exception.  See except/18 for a test case.
1227
1228 15.2 para 13: The exception being handled should be rethrown if control
1229 reaches the end of a handler of the function-try-block of a constructor
1230 or destructor, right now, it is not.
1231
1232 15.2 para 12: If a return statement appears in a handler of
1233 function-try-block of a constructor, the program is ill-formed, but this
1234 isn't diagnosed.
1235
1236 15.2 para 11: If the handlers of a function-try-block contain a jump
1237 into the body of a constructor or destructor, the program is ill-formed,
1238 but this isn't diagnosed.
1239
1240 15.2 para 9: Check that the fully constructed base classes and members
1241 of an object are destroyed before entering the handler of a
1242 function-try-block of a constructor or destructor for that object.
1243
1244 build_exception_variant should sort the incoming list, so that it
1245 implements set compares, not exact list equality.  Type smashing should
1246 smash exception specifications using set union.
1247
1248 Thrown objects are usually allocated on the heap, in the usual way.  If
1249 one runs out of heap space, throwing an object will probably never work.
1250 This could be relaxed some by passing an __in_chrg parameter to track
1251 who has control over the exception object.  Thrown objects are not
1252 allocated on the heap when they are pointer to object types.  We should
1253 extend it so that all small (<4*sizeof(void*)) objects are stored
1254 directly, instead of allocated on the heap.
1255
1256 When the backend returns a value, it can create new exception regions
1257 that need protecting.  The new region should rethrow the object in
1258 context of the last associated cleanup that ran to completion.
1259
1260 The structure of the code that is generated for C++ exception handling
1261 code is shown below:
1262
1263 @example
1264 Ln:                                     throw value;
1265         copy value onto heap
1266         jump throw (Ln, id, address of copy of value on heap)
1267
1268                                         try @{
1269 +Lstart:        the start of the main EH region
1270 |...                                            ...
1271 +Lend:          the end of the main EH region
1272                                         @} catch (T o) @{
1273                                                 ...1
1274                                         @}
1275 Lresume:
1276         nop     used to make sure there is something before
1277                 the next region ends, if there is one
1278 ...                                     ...
1279
1280         jump Ldone
1281 [
1282 Lmainhandler:    handler for the region Lstart-Lend
1283         cleanup
1284 ] zero or more, depending upon automatic vars with dtors
1285 +Lpartial:
1286 |        jump Lover
1287 +Lhere:
1288         rethrow (Lhere, same id, same obj);
1289 Lterm:          handler for the region Lpartial-Lhere
1290         call terminate
1291 Lover:
1292 [
1293  [
1294         call throw_type_match
1295         if (eq) @{
1296  ] these lines disappear when there is no catch condition
1297 +Lsregion2:
1298 |       ...1
1299 |       jump Lresume
1300 |Lhandler:      handler for the region Lsregion2-Leregion2
1301 |       rethrow (Lresume, same id, same obj);
1302 +Leregion2
1303         @}
1304 ] there are zero or more of these sections, depending upon how many
1305   catch clauses there are
1306 ----------------------------- expand_end_all_catch --------------------------
1307                 here we have fallen off the end of all catch
1308                 clauses, so we rethrow to outer
1309         rethrow (Lresume, same id, same obj);
1310 ----------------------------- expand_end_all_catch --------------------------
1311 [
1312 L1:     maybe throw routine
1313 ] depending upon if we have expanded it or not
1314 Ldone:
1315         ret
1316
1317 start_all_catch emits labels: Lresume,
1318
1319 @end example
1320
1321 The __unwind_function takes a pointer to the throw handler, and is
1322 expected to pop the stack frame that was built to call it, as well as
1323 the frame underneath and then jump to the throw handler.  It must
1324 restore all registers to their proper values as well as all other
1325 machine state as determined by the context in which we are unwinding
1326 into.  The way I normally start is to compile:
1327
1328         void *g;
1329         foo(void* a) @{ g = a; @}
1330
1331 with -S, and change the thing that alters the PC (return, or ret
1332 usually) to not alter the PC, making sure to leave all other semantics
1333 (like adjusting the stack pointer, or frame pointers) in.  After that,
1334 replicate the prologue once more at the end, again, changing the PC
1335 altering instructions, and finally, at the very end, jump to `g'.
1336
1337 It takes about a week to write this routine, if someone wants to
1338 volunteer to write this routine for any architecture, exception support
1339 for that architecture will be added to g++.  Please send in those code
1340 donations.  One other thing that needs to be done, is to double check
1341 that __builtin_return_address (0) works.
1342
1343 @subsection Specific Targets
1344
1345 For the alpha, the __unwind_function will be something resembling:
1346
1347 @example
1348 void
1349 __unwind_function(void *ptr)
1350 @{
1351   /* First frame */
1352   asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */
1353   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1354
1355   /* Second frame */
1356   asm ("ldq $15, 8($30)"); /* fp */
1357   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1358
1359   /* Return */
1360   asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */
1361 @}
1362 @end example
1363
1364 @noindent
1365 However, there are a few problems preventing it from working.  First of
1366 all, the gcc-internal function @code{__builtin_return_address} needs to
1367 work given an argument of 0 for the alpha.  As it stands as of August
1368 30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c}
1369 will definitely not work on the alpha.  Instead, we need to define
1370 the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe),
1371 @code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new
1372 definition for @code{RETURN_ADDR_RTX}.
1373
1374 In addition (and more importantly), we need a way to reliably find the
1375 frame pointer on the alpha.  The use of the value 8 above to restore the
1376 frame pointer (register 15) is incorrect.  On many systems, the frame
1377 pointer is consistently offset to a specific point on the stack.  On the
1378 alpha, however, the frame pointer is pushed last.  First the return
1379 address is stored, then any other registers are saved (e.g., @code{s0}),
1380 and finally the frame pointer is put in place.  So @code{fp} could have
1381 an offset of 8, but if the calling function saved any registers at all,
1382 they add to the offset.
1383
1384 The only places the frame size is noted are with the @samp{.frame}
1385 directive, for use by the debugger and the OSF exception handling model
1386 (useless to us), and in the initial computation of the new value for
1387 @code{sp}, the stack pointer.  For example, the function may start with:
1388
1389 @example
1390 lda $30,-32($30)
1391 .frame $15,32,$26,0
1392 @end example
1393
1394 @noindent
1395 The 32 above is exactly the value we need.  With this, we can be sure
1396 that the frame pointer is stored 8 bytes less---in this case, at 24(sp)).
1397 The drawback is that there is no way that I (Brendan) have found to let
1398 us discover the size of a previous frame @emph{inside} the definition
1399 of @code{__unwind_function}.
1400
1401 So to accomplish exception handling support on the alpha, we need two
1402 things: first, a way to figure out where the frame pointer was stored,
1403 and second, a functional @code{__builtin_return_address} implementation
1404 for except.c to be able to use it.
1405
1406 Or just support DWARF 2 unwind info.
1407
1408 @subsection New Backend Exception Support
1409
1410 This subsection discusses various aspects of the design of the
1411 data-driven model being implemented for the exception handling backend.
1412
1413 The goal is to generate enough data during the compilation of user code,
1414 such that we can dynamically unwind through functions at run time with a
1415 single routine (@code{__throw}) that lives in libgcc.a, built by the
1416 compiler, and dispatch into associated exception handlers.
1417
1418 This information is generated by the DWARF 2 debugging backend, and
1419 includes all of the information __throw needs to unwind an arbitrary
1420 frame.  It specifies where all of the saved registers and the return
1421 address can be found at any point in the function.
1422
1423 Major disadvantages when enabling exceptions are:
1424
1425 @itemize @bullet
1426 @item
1427 Code that uses caller saved registers, can't, when flow can be
1428 transfered into that code from an exception handler.  In high performace
1429 code this should not usually be true, so the effects should be minimal.
1430
1431 @end itemize
1432
1433 @subsection Backend Exception Support
1434
1435 The backend must be extended to fully support exceptions.  Right now
1436 there are a few hooks into the alpha exception handling backend that
1437 resides in the C++ frontend from that backend that allows exception
1438 handling to work in g++.  An exception region is a segment of generated
1439 code that has a handler associated with it.  The exception regions are
1440 denoted in the generated code as address ranges denoted by a starting PC
1441 value and an ending PC value of the region.  Some of the limitations
1442 with this scheme are:
1443
1444 @itemize @bullet
1445 @item
1446 The backend replicates insns for such things as loop unrolling and
1447 function inlining.  Right now, there are no hooks into the frontend's
1448 exception handling backend to handle the replication of insns.  When
1449 replication happens, a new exception region descriptor needs to be
1450 generated for the new region.
1451
1452 @item
1453 The backend expects to be able to rearrange code, for things like jump
1454 optimization.  Any rearranging of the code needs have exception region
1455 descriptors updated appropriately.
1456
1457 @item
1458 The backend can eliminate dead code.  Any associated exception region
1459 descriptor that refers to fully contained code that has been eliminated
1460 should also be removed, although not doing this is harmless in terms of
1461 semantics.
1462
1463 @end itemize
1464
1465 The above is not meant to be exhaustive, but does include all things I
1466 have thought of so far.  I am sure other limitations exist.
1467
1468 Below are some notes on the migration of the exception handling code
1469 backend from the C++ frontend to the backend.
1470
1471 NOTEs are to be used to denote the start of an exception region, and the
1472 end of the region.  I presume that the interface used to generate these
1473 notes in the backend would be two functions, start_exception_region and
1474 end_exception_region (or something like that).  The frontends are
1475 required to call them in pairs.  When marking the end of a region, an
1476 argument can be passed to indicate the handler for the marked region.
1477 This can be passed in many ways, currently a tree is used.  Another
1478 possibility would be insns for the handler, or a label that denotes a
1479 handler.  I have a feeling insns might be the the best way to pass it.
1480 Semantics are, if an exception is thrown inside the region, control is
1481 transfered unconditionally to the handler.  If control passes through
1482 the handler, then the backend is to rethrow the exception, in the
1483 context of the end of the original region.  The handler is protected by
1484 the conventional mechanisms; it is the frontend's responsibility to
1485 protect the handler, if special semantics are required.
1486
1487 This is a very low level view, and it would be nice is the backend
1488 supported a somewhat higher level view in addition to this view.  This
1489 higher level could include source line number, name of the source file,
1490 name of the language that threw the exception and possibly the name of
1491 the exception.  Kenner may want to rope you into doing more than just
1492 the basics required by C++.  You will have to resolve this.  He may want
1493 you to do support for non-local gotos, first scan for exception handler,
1494 if none is found, allow the debugger to be entered, without any cleanups
1495 being done.  To do this, the backend would have to know the difference
1496 between a cleanup-rethrower, and a real handler, if would also have to
1497 have a way to know if a handler `matches' a thrown exception, and this
1498 is frontend specific.
1499
1500 The stack unwinder is one of the hardest parts to do.  It is highly
1501 machine dependent.  The form that kenner seems to like was a couple of
1502 macros, that would do the machine dependent grunt work.  One preexisting
1503 function that might be of some use is __builtin_return_address ().  One
1504 macro he seemed to want was __builtin_return_address, and the other
1505 would do the hard work of fixing up the registers, adjusting the stack
1506 pointer, frame pointer, arg pointer and so on.
1507
1508
1509 @node Free Store, Mangling, Exception Handling, Top
1510 @section Free Store
1511
1512 @code{operator new []} adds a magic cookie to the beginning of arrays
1513 for which the number of elements will be needed by @code{operator delete
1514 []}.  These are arrays of objects with destructors and arrays of objects
1515 that define @code{operator delete []} with the optional size_t argument.
1516 This cookie can be examined from a program as follows:
1517
1518 @example
1519 typedef unsigned long size_t;
1520 extern "C" int printf (const char *, ...);
1521
1522 size_t nelts (void *p)
1523 @{
1524   struct cookie @{
1525     size_t nelts __attribute__ ((aligned (sizeof (double))));
1526   @};
1527
1528   cookie *cp = (cookie *)p;
1529   --cp;
1530
1531   return cp->nelts;
1532 @}
1533
1534 struct A @{
1535   ~A() @{ @}
1536 @};
1537
1538 main()
1539 @{
1540   A *ap = new A[3];
1541   printf ("%ld\n", nelts (ap));
1542 @}
1543 @end example
1544
1545 @section Linkage
1546 The linkage code in g++ is horribly twisted in order to meet two design goals:
1547
1548 1) Avoid unnecessary emission of inlines and vtables.
1549
1550 2) Support pedantic assemblers like the one in AIX.
1551
1552 To meet the first goal, we defer emission of inlines and vtables until
1553 the end of the translation unit, where we can decide whether or not they
1554 are needed, and how to emit them if they are.
1555
1556 @node Mangling, Concept Index, Free Store, Top
1557 @section Function name mangling for C++ and Java
1558
1559 Both C++ and Jave provide overloaded function and methods,
1560 which are methods with the same types but different parameter lists.
1561 Selecting the correct version is done at compile time.
1562 Though the overloaded functions have the same name in the source code,
1563 they need to be translated into different assembler-level names,
1564 since typical assemblers and linkers cannot handle overloading.
1565 This process of encoding the parameter types with the method name
1566 into a unique name is called @dfn{name mangling}.  The inverse
1567 process is called @dfn{demangling}.
1568
1569 It is convenient that C++ and Java use compatible mangling schemes,
1570 since the makes life easier for tools such as gdb, and it eases
1571 integration between C++ and Java.
1572
1573 Note there is also a standard "Jave Native Interface" (JNI) which
1574 implements a different calling convention, and uses a different
1575 mangling scheme.  The JNI is a rather abstract ABI so Java can call methods
1576 written in C or C++;
1577 we are concerned here about a lower-level interface primarily
1578 intended for methods written in Java, but that can also be used for C++
1579 (and less easily C).
1580
1581 @subsection Method name mangling
1582
1583 C++ mangles a method by emitting the function name, followed by @code{__},
1584 followed by encodings of any method qualifiers (such as @code{const}),
1585 followed by the mangling of the method's class,
1586 followed by the mangling of the parameters, in order.
1587
1588 For example @code{Foo::bar(int, long) const} is mangled
1589 as @samp{bar__C3Fooil}.
1590
1591 For a constructor, the method name is left out.
1592 That is @code{Foo::Foo(int, long) const}  is mangled
1593 as @samp{__C3Fooil}.
1594
1595 GNU Java does the same.
1596
1597 @subsection Primitive types
1598
1599 The C++ types @code{int}, @code{long}, @code{short}, @code{char},
1600 and @code{long long} are mangled as @samp{i}, @samp{l},
1601 @samp{s}, @samp{c}, and @samp{x}, respectively.
1602 The corresponding unsigned types have @samp{U} prefixed
1603 to the mangling.  The type @code{signed char} is mangled @samp{Sc}.
1604
1605 The C++ and Java floating-point types @code{float} and @code{double}
1606 are mangled as @samp{f} and @samp{d} respectively.
1607
1608 The C++ @code{bool} type and the Java @code{boolean} type are
1609 mangled as @samp{b}.
1610
1611 The C++ @code{wchar_t} and the Java @code{char} types are
1612 mangled as @samp{w}.
1613
1614 The Java integral types @code{byte}, @code{short}, @code{int}
1615 and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i},
1616 and @samp{x}, respectively.
1617
1618 C++ code that has included @code{javatypes.h} will mangle
1619 the typedefs  @code{jbyte}, @code{jshort}, @code{jint}
1620 and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i},
1621 and @samp{x}.  (This has not been implemented yet.)
1622
1623 @subsection Mangling of simple names
1624
1625 A simple class, package, template, or namespace name is
1626 encoded as the number of characters in the name, followed by
1627 the actual characters.  Thus the class @code{Foo}
1628 is encoded as @samp{3Foo}.
1629
1630 If any of the characters in the name are not alphanumeric
1631 (i.e not one of the standard ASCII letters, digits, or '_'),
1632 or the initial character is a digit, then the name is
1633 mangled as a sequence of encoded Unicode letters.
1634 A Unicode encoding starts with a @samp{U} to indicate
1635 that Unicode escapes are used, followed by the number of
1636 bytes used by the Unicode encoding, followed by the bytes
1637 representing the encoding.  ASSCI letters and
1638 non-initial digits are encoded without change.  However, all
1639 other characters (including underscore and initial digits) are
1640 translated into a sequence starting with an underscore,
1641 followed by the big-endian 4-hex-digit lower-case encoding of the character.
1642
1643 If a method name contains Unicode-escaped characters, the
1644 entire mangled method name is followed by a @samp{U}.
1645
1646 For example, the method @code{X\u0319::M\u002B(int)} is encoded as
1647 @samp{M_002b__U6X_0319iU}.
1648
1649 @subsection Pointer and reference types
1650
1651 A C++ pointer type is mangled as @samp{P} followed by the
1652 mangling of the type pointed to.
1653
1654 A C++ reference type as mangled as @samp{R} followed by the
1655 mangling of the type referenced.
1656
1657 A Java object reference type is equivalent
1658 to a C++ pointer parameter, so we mangle such an parameter type
1659 as @samp{P} followed by the mangling of the class name.
1660
1661 @subsection Qualified names
1662
1663 Both C++ and Java allow a class to be lexically nested inside another
1664 class.  C++ also supports namespaces (not yet implemented by G++).
1665 Java also supports packages.
1666
1667 These are all mangled the same way:  First the letter @samp{Q}
1668 indicates that we are emitting a qualified name.
1669 That is followed by the number of parts in the qualified name.
1670 If that number is 9 or less, it is emitted with no delimiters.
1671 Otherwise, an underscore is written before and after the count.
1672 Then follows each part of the qualified name, as described above.
1673
1674 For example @code{Foo::\u0319::Bar} is encoded as
1675 @samp{Q33FooU5_03193Bar}.
1676
1677 @subsection Templates
1678
1679 A class template instantiation is encoded as the letter @samp{t},
1680 followed by the encoding of the template name, followed
1681 the number of template parameters, followed by encoding of the template
1682 parameters.  If a template parameter is a type, it is written
1683 as a @samp{Z} followed by the encoding of the type.
1684
1685 A function template specialization (either an instantiation or an
1686 explicit specialization) is encoded by an @samp{H} followed by the
1687 encoding of the template parameters, as described above, followed by
1688 an @samp{_}, the encoding of the argument types template function (not the
1689 specialization), another @samp{_}, and the return type.  (Like the
1690 argument types, the return type is the return type of the function
1691 template, not the specialization.)  Template parameters in the argument
1692 and return types are encoded by an @samp{X} for type parameters, or a
1693 @samp{Y} for constant parameters, and an index indicating their position
1694 in the template parameter list declaration.
1695
1696 @subsection Arrays
1697
1698 C++ array types are mangled by emitting @samp{A}, followed by
1699 the length of the array, followed by an @samp{_}, followed by
1700 the mangling of the element type.  Of course, normally
1701 array parameter types decay into a pointer types, so you
1702 don't see this.
1703
1704 Java arrays are objects.  A Java type @code{T[]} is mangled
1705 as if it were the C++ type @code{JArray<T>}.
1706 For example @code{java.lang.String[]} is encoded as
1707 @samp{Pt6JArray1ZPQ34java4lang6String}.
1708
1709 @subsection Table of demangling code characters
1710
1711 The following special characters are used in mangling:
1712
1713 @table @samp
1714 @item A
1715 Indicates a C++ array type.
1716
1717 @item b
1718 Encodes the C++ @code{bool} type,
1719 and the Java @code{boolean} type.
1720
1721 @item c
1722 Encodes the C++ @code{char} type, and the Java @code{byte} type.
1723
1724 @item C
1725 A modifier to indicate a @code{const} type.
1726 Also used to indicate a @code{const} member function
1727 (in which cases it precedes the encoding of the method's class).
1728
1729 @item d
1730 Encodes the C++ and Java @code{double} types.
1731
1732 @item e
1733 Indicates extra unknown arguments @code{...}.
1734
1735 @item f
1736 Encodes the C++ and Java @code{float} types.
1737
1738 @item F
1739 Used to indicate a function type.
1740
1741 @item H
1742 Used to indicate a template function.
1743
1744 @item i
1745 Encodes the C++ and Java @code{int} types.
1746
1747 @item J
1748 Indicates a complex type.
1749
1750 @item l
1751 Encodes the C++ @code{long} type.
1752
1753 @item P
1754 Indicates a pointer type.  Followed by the type pointed to.
1755
1756 @item Q
1757 Used to mangle qualified names, which arise from nested classes.
1758 Should also be used for namespaces (?).
1759 In Java used to mangle package-qualified names, and inner classes.
1760
1761 @item r
1762 Encodes the GNU C++ @code{long double} type.
1763
1764 @item R
1765 Indicates a reference type.  Followed by the referenced type.
1766
1767 @item s
1768 Encodes the C++ and java @code{short} types.
1769
1770 @item S
1771 A modifier that indicates that the following integer type is signed.
1772 Only used with @code{char}.
1773
1774 Also used as a modifier to indicate a static member function.
1775
1776 @item t
1777 Indicates a template instantiation.
1778
1779 @item T
1780 A back reference to a previously seen type.
1781
1782 @item U
1783 A modifier that indicates that the following integer type is unsigned.
1784 Also used to indicate that the following class or namespace name
1785 is encoded using Unicode-mangling.
1786
1787 @item v
1788 Encodes the C++ and Java @code{void} types.
1789
1790 @item V
1791 A modified for a @code{const} type or method.
1792
1793 @item w
1794 Encodes the C++ @code{wchar_t} type, and the Java @code{char} types.
1795
1796 @item x
1797 Encodes the GNU C++ @code{long long} type, and the Java @code{long} type.
1798
1799 @item X
1800 Encodes a template type parameter, when part of a function type.
1801
1802 @item Y
1803 Encodes a template constant parameter, when part of a function type.
1804
1805 @item Z
1806 Used for template type parameters.
1807
1808 @end table
1809
1810 The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p}
1811 also seem to be used for obscure purposes ...
1812
1813 @node Concept Index,  , Mangling, Top
1814
1815 @section Concept Index
1816
1817 @printindex cp
1818
1819 @bye