]> git.ipfire.org Git - thirdparty/gcc.git/blame - gcc/cp/gxxint.texi
typo typo fixes fixes
[thirdparty/gcc.git] / gcc / cp / gxxint.texi
CommitLineData
8d08fdba
MS
1\input texinfo @c -*-texinfo-*-
2@c %**start of header
3@setfilename g++int.info
4@settitle G++ internals
5@setchapternewpage odd
6@c %**end of header
7
8@node Top, Limitations of g++, (dir), (dir)
9@chapter Internal Architecture of the Compiler
10
8d2733ca 11This is meant to describe the C++ front-end for gcc in detail.
4c5f3fcd 12Questions and comments to Benjamin Kosnik @code{<bkoz@@cygnus.com>}.
8d08fdba
MS
13
14@menu
15* Limitations of g++::
16* Routines::
17* Implementation Specifics::
18* Glossary::
19* Macros::
20* Typical Behavior::
21* Coding Conventions::
22* Templates::
23* Access Control::
24* Error Reporting::
51c184be
MS
25* Parser::
26* Copying Objects::
f0e01782
MS
27* Exception Handling::
28* Free Store::
42976354 29* Mangling:: Function name mangling for C++ and Java
8d08fdba
MS
30* Concept Index::
31@end menu
32
33@node Limitations of g++, Routines, Top, Top
34@section Limitations of g++
35
36@itemize @bullet
37@item
38Limitations on input source code: 240 nesting levels with the parser
39stacksize (YYSTACKSIZE) set to 500 (the default), and requires around
4016.4k swap space per nesting level. The parser needs about 2.09 *
41number of nesting levels worth of stackspace.
42
43@cindex pushdecl_class_level
44@item
45I suspect there are other uses of pushdecl_class_level that do not call
46set_identifier_type_value in tandem with the call to
47pushdecl_class_level. It would seem to be an omission.
48
8d08fdba
MS
49@cindex access checking
50@item
f0e01782 51Access checking is unimplemented for nested types.
8d08fdba
MS
52
53@cindex @code{volatile}
54@item
55@code{volatile} is not implemented in general.
56
8d08fdba
MS
57@end itemize
58
59@node Routines, Implementation Specifics, Limitations of g++, Top
60@section Routines
61
62This section describes some of the routines used in the C++ front-end.
63
64@code{build_vtable} and @code{prepare_fresh_vtable} is used only within
65the @file{cp-class.c} file, and only in @code{finish_struct} and
66@code{modify_vtable_entries}.
67
68@code{build_vtable}, @code{prepare_fresh_vtable}, and
69@code{finish_struct} are the only routines that set @code{DECL_VPARENT}.
70
71@code{finish_struct} can steal the virtual function table from parents,
72this prohibits related_vslot from working. When finish_struct steals,
73we know that
74
75@example
76get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
77@end example
78
79@noindent
80will get the related binfo.
81
82@code{layout_basetypes} does something with the VIRTUALS.
83
84Supposedly (according to Tiemann) most of the breadth first searching
85done, like in @code{get_base_distance} and in @code{get_binfo} was not
86because of any design decision. I have since found out the at least one
87part of the compiler needs the notion of depth first binfo searching, I
88am going to try and convert the whole thing, it should just work. The
89term left-most refers to the depth first left-most node. It uses
90@code{MAIN_VARIANT == type} as the condition to get left-most, because
91the things that have @code{BINFO_OFFSET}s of zero are shared and will
92have themselves as their own @code{MAIN_VARIANT}s. The non-shared right
93ones, are copies of the left-most one, hence if it is its own
6de129de 94@code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is
8d08fdba
MS
95a non-left-most one.
96
97@code{get_base_distance}'s path and distance matters in its use in:
98
99@itemize @bullet
100@item
101@code{prepare_fresh_vtable} (the code is probably wrong)
102@item
103@code{init_vfields} Depends upon distance probably in a safe way,
104build_offset_ref might use partial paths to do further lookups,
105hack_identifier is probably not properly checking access.
106
107@item
108@code{get_first_matching_virtual} probably should check for
109@code{get_base_distance} returning -2.
110
111@item
112@code{resolve_offset_ref} should be called in a more deterministic
113manner. Right now, it is called in some random contexts, like for
114arguments at @code{build_method_call} time, @code{default_conversion}
115time, @code{convert_arguments} time, @code{build_unary_op} time,
116@code{build_c_cast} time, @code{build_modify_expr} time,
117@code{convert_for_assignment} time, and
118@code{convert_for_initialization} time.
119
120But, there are still more contexts it needs to be called in, one was the
121ever simple:
122
123@example
124if (obj.*pmi != 7)
125 @dots{}
126@end example
127
128Seems that the problems were due to the fact that @code{TREE_TYPE} of
129the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type
130of the referent (like @code{INTEGER_TYPE}). This problem was fixed by
131changing @code{default_conversion} to check @code{TREE_CODE (x)},
132instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it
133was @code{OFFSET_TYPE}.
134
135@end itemize
136
137@node Implementation Specifics, Glossary, Routines, Top
138@section Implementation Specifics
139
140@itemize @bullet
141@item Explicit Initialization
142
143The global list @code{current_member_init_list} contains the list of
144mem-initializers specified in a constructor declaration. For example:
145
146@example
147foo::foo() : a(1), b(2) @{@}
148@end example
149
150@noindent
151will initialize @samp{a} with 1 and @samp{b} with 2.
152@code{expand_member_init} places each initialization (a with 1) on the
153global list. Then, when the fndecl is being processed,
154@code{emit_base_init} runs down the list, initializing them. It used to
155be the case that g++ first ran down @code{current_member_init_list},
156then ran down the list of members initializing the ones that weren't
157explicitly initialized. Things were rewritten to perform the
158initializations in order of declaration in the class. So, for the above
159example, @samp{a} and @samp{b} will be initialized in the order that
160they were declared:
161
162@example
163class foo @{ public: int b; int a; foo (); @};
164@end example
165
166@noindent
167Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be
168initialized with 1, regardless of how they're listed in the mem-initializer.
169
170@item Argument Matching
171
172In early 1993, the argument matching scheme in @sc{gnu} C++ changed
173significantly. The original code was completely replaced with a new
174method that will, hopefully, be easier to understand and make fixing
175specific cases much easier.
176
177The @samp{-fansi-overloading} option is used to enable the new code; at
178some point in the future, it will become the default behavior of the
179compiler.
180
181The file @file{cp-call.c} contains all of the new work, in the functions
182@code{rank_for_overload}, @code{compute_harshness},
183@code{compute_conversion_costs}, and @code{ideal_candidate}.
184
185Instead of using obscure numerical values, the quality of an argument
186match is now represented by clear, individual codes. The new data
187structure @code{struct harshness} (it used to be an @code{unsigned}
188number) contains:
189
190@enumerate a
191@item the @samp{code} field, to signify what was involved in matching two
192arguments;
193@item the @samp{distance} field, used in situations where inheritance
194decides which function should be called (one is ``closer'' than
195another);
196@item and the @samp{int_penalty} field, used by some codes as a tie-breaker.
197@end enumerate
198
199The @samp{code} field is a number with a given bit set for each type of
200code, OR'd together. The new codes are:
201
202@itemize @bullet
203@item @code{EVIL_CODE}
204The argument was not a permissible match.
205
206@item @code{CONST_CODE}
207Currently, this is only used by @code{compute_conversion_costs}, to
208distinguish when a non-@code{const} member function is called from a
209@code{const} member function.
210
211@item @code{ELLIPSIS_CODE}
212A match against an ellipsis @samp{...} is considered worse than all others.
213
214@item @code{USER_CODE}
215Used for a match involving a user-defined conversion.
216
217@item @code{STD_CODE}
218A match involving a standard conversion.
219
220@item @code{PROMO_CODE}
221A match involving an integral promotion. For these, the
222@code{int_penalty} field is used to handle the ARM's rule (XXX cite)
223that a smaller @code{unsigned} type should promote to a @code{int}, not
224to an @code{unsigned int}.
225
226@item @code{QUAL_CODE}
227Used to mark use of qualifiers like @code{const} and @code{volatile}.
228
229@item @code{TRIVIAL_CODE}
230Used for trivial conversions. The @samp{int_penalty} field is used by
231@code{convert_harshness} to communicate further penalty information back
232to @code{build_overload_call_real} when deciding which function should
233be call.
234@end itemize
235
236The functions @code{convert_to_aggr} and @code{build_method_call} use
237@code{compute_conversion_costs} to rate each argument's suitability for
238a given candidate function (that's how we get the list of candidates for
239@code{ideal_candidate}).
240
e050253a
BK
241@item The Explicit Keyword
242
243The use of @code{explicit} on a constructor is used by @code{grokdeclarator}
244to set the field @code{DECL_NONCONVERTING_P}. That value is used by
245@code{build_method_call} and @code{build_user_type_conversion_1} to decide
246if a particular constructor should be used as a candidate for conversions.
247
8d08fdba
MS
248@end itemize
249
250@node Glossary, Macros, Implementation Specifics, Top
251@section Glossary
252
253@table @r
254@item binfo
255The main data structure in the compiler used to represent the
256inheritance relationships between classes. The data in the binfo can be
257accessed by the BINFO_ accessor macros.
258
259@item vtable
260@itemx virtual function table
261
262The virtual function table holds information used in virtual function
263dispatching. In the compiler, they are usually referred to as vtables,
264or vtbls. The first index is not used in the normal way, I believe it
265is probably used for the virtual destructor.
266
267@item vfield
268
269vfields can be thought of as the base information needed to build
270vtables. For every vtable that exists for a class, there is a vfield.
271See also vtable and virtual function table pointer. When a type is used
272as a base class to another type, the virtual function table for the
273derived class can be based upon the vtable for the base class, just
274extended to include the additional virtual methods declared in the
275derived class. The virtual function table from a virtual base class is
276never reused in a derived class. @code{is_normal} depends upon this.
277
278@item virtual function table pointer
279
280These are @code{FIELD_DECL}s that are pointer types that point to
281vtables. See also vtable and vfield.
282@end table
283
284@node Macros, Typical Behavior, Glossary, Top
285@section Macros
286
287This section describes some of the macros used on trees. The list
288should be alphabetical. Eventually all macros should be documented
e349ee73 289here.
8d08fdba
MS
290
291@table @code
292@item BINFO_BASETYPES
293A vector of additional binfos for the types inherited by this basetype.
294The binfos are fully unshared (except for virtual bases, in which
295case the binfo structure is shared).
296
297 If this basetype describes type D as inherited in C,
298 and if the basetypes of D are E anf F,
299 then this vector contains binfos for inheritance of E and F by C.
300
301Has values of:
302
303 TREE_VECs
304
305
306@item BINFO_INHERITANCE_CHAIN
307Temporarily used to represent specific inheritances. It usually points
308to the binfo associated with the lesser derived type, but it can be
309reversed by reverse_path. For example:
310
311@example
312 Z ZbY least derived
313 |
314 Y YbX
315 |
316 X Xb most derived
317
318TYPE_BINFO (X) == Xb
319BINFO_INHERITANCE_CHAIN (Xb) == YbX
320BINFO_INHERITANCE_CHAIN (Yb) == ZbY
321BINFO_INHERITANCE_CHAIN (Zb) == 0
322@end example
323
324Not sure is the above is really true, get_base_distance has is point
325towards the most derived type, opposite from above.
326
327Set by build_vbase_path, recursive_bounded_basetype_p,
328get_base_distance, lookup_field, lookup_fnfields, and reverse_path.
329
330What things can this be used on:
331
332 TREE_VECs that are binfos
333
334
335@item BINFO_OFFSET
336The offset where this basetype appears in its containing type.
337BINFO_OFFSET slot holds the offset (in bytes) from the base of the
338complete object to the base of the part of the object that is allocated
339on behalf of this `type'. This is always 0 except when there is
340multiple inheritance.
341
342Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example.
343
344
345@item BINFO_VIRTUALS
346A unique list of functions for the virtual function table. See also
347TYPE_BINFO_VIRTUALS.
348
349What things can this be used on:
350
351 TREE_VECs that are binfos
352
353
354@item BINFO_VTABLE
355Used to find the VAR_DECL that is the virtual function table associated
356with this binfo. See also TYPE_BINFO_VTABLE. To get the virtual
357function table pointer, see CLASSTYPE_VFIELD.
358
359What things can this be used on:
360
361 TREE_VECs that are binfos
362
363Has values of:
364
365 VAR_DECLs that are virtual function tables
366
367
368@item BLOCK_SUPERCONTEXT
369In the outermost scope of each function, it points to the FUNCTION_DECL
370node. It aids in better DWARF support of inline functions.
371
372
373@item CLASSTYPE_TAGS
374CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a
375class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans
376these and calls pushtag on them.)
377
378finish_struct scans these to produce TYPE_DECLs to add to the
379TYPE_FIELDS of the type.
380
381It is expected that name found in the TREE_PURPOSE slot is unique,
382resolve_scope_to_name is one such place that depends upon this
383uniqueness.
384
385
386@item CLASSTYPE_METHOD_VEC
387The following is true after finish_struct has been called (on the
388class?) but not before. Before finish_struct is called, things are
389different to some extent. Contains a TREE_VEC of methods of the class.
390The TREE_VEC_LENGTH is the number of differently named methods plus one
391for the 0th entry. The 0th entry is always allocated, and reserved for
392ctors and dtors. If there are none, TREE_VEC_ELT(N,0) == NULL_TREE.
393Each entry of the TREE_VEC is a FUNCTION_DECL. For each FUNCTION_DECL,
394there is a DECL_CHAIN slot. If the FUNCTION_DECL is the last one with a
395given name, the DECL_CHAIN slot is NULL_TREE. Otherwise it is the next
396method that has the same name (but a different signature). It would
397seem that it is not true that because the DECL_CHAIN slot is used in
398this way, we cannot call pushdecl to put the method in the global scope
399(cause that would overwrite the TREE_CHAIN slot), because they use
400different _CHAINs. finish_struct_methods setups up one version of the
401TREE_CHAIN slots on the FUNCTION_DECLs.
402
403friends are kept in TREE_LISTs, so that there's no need to use their
404TREE_CHAIN slot for anything.
405
406Has values of:
407
408 TREE_VECs
409
410
411@item CLASSTYPE_VFIELD
412Seems to be in the process of being renamed TYPE_VFIELD. Use on types
413to get the main virtual function table pointer. To get the virtual
414function table use BINFO_VTABLE (TYPE_BINFO ()).
415
416Has values of:
417
418 FIELD_DECLs that are virtual function table pointers
419
420What things can this be used on:
421
422 RECORD_TYPEs
423
424
425@item DECL_CLASS_CONTEXT
426Identifies the context that the _DECL was found in. For virtual function
427tables, it points to the type associated with the virtual function
428table. See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT.
429
430The difference between this and DECL_CONTEXT, is that for virtuals
431functions like:
432
433@example
434struct A
435@{
436 virtual int f ();
437@};
438
439struct B : A
440@{
441 int f ();
442@};
443
444DECL_CONTEXT (A::f) == A
445DECL_CLASS_CONTEXT (A::f) == A
446
447DECL_CONTEXT (B::f) == A
448DECL_CLASS_CONTEXT (B::f) == B
449@end example
450
451Has values of:
452
453 RECORD_TYPEs, or UNION_TYPEs
454
455What things can this be used on:
456
457 TYPE_DECLs, _DECLs
458
459
460@item DECL_CONTEXT
461Identifies the context that the _DECL was found in. Can be used on
462virtual function tables to find the type associated with the virtual
463function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a
464better access method. Internally the same as DECL_FIELD_CONTEXT, so
465don't us both. See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and
466DECL_CLASS_CONTEXT.
467
468Has values of:
469
470 RECORD_TYPEs
471
472
473What things can this be used on:
474
475@display
476VAR_DECLs that are virtual function tables
477_DECLs
478@end display
479
480
481@item DECL_FIELD_CONTEXT
482Identifies the context that the FIELD_DECL was found in. Internally the
483same as DECL_CONTEXT, so don't us both. See also DECL_CONTEXT,
484DECL_FCONTEXT and DECL_CLASS_CONTEXT.
485
486Has values of:
487
488 RECORD_TYPEs
489
490What things can this be used on:
491
492@display
493FIELD_DECLs that are virtual function pointers
494FIELD_DECLs
495@end display
496
497
8d08fdba
MS
498@item DECL_NAME
499
500Has values of:
501
502@display
5030 for things that don't have names
504IDENTIFIER_NODEs for TYPE_DECLs
505@end display
506
507@item DECL_IGNORED_P
508A bit that can be set to inform the debug information output routines in
8d2733ca 509the back-end that a certain _DECL node should be totally ignored.
8d08fdba
MS
510
511Used in cases where it is known that the debugging information will be
512output in another file, or where a sub-type is known not to be needed
513because the enclosing type is not needed.
514
515A compiler constructed virtual destructor in derived classes that do not
67cc5fec 516define an explicit destructor that was defined explicit in a base class
8d08fdba
MS
517has this bit set as well. Also used on __FUNCTION__ and
518__PRETTY_FUNCTION__ to mark they are ``compiler generated.'' c-decl and
519c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,''
520and ``user-invisible variable.''
521
522Functions built by the C++ front-end such as default destructors,
67cc5fec 523virtual destructors and default constructors want to be marked that
8d08fdba
MS
524they are compiler generated, but unsure why.
525
526Currently, it is used in an absolute way in the C++ front-end, as an
527optimization, to tell the debug information output routines to not
528generate debugging information that will be output by another separately
529compiled file.
530
531
532@item DECL_VIRTUAL_P
533A flag used on FIELD_DECLs and VAR_DECLs. (Documentation in tree.h is
534wrong.) Used in VAR_DECLs to indicate that the variable is a vtable.
535It is also used in FIELD_DECLs for vtable pointers.
536
537What things can this be used on:
538
539 FIELD_DECLs and VAR_DECLs
540
541
542@item DECL_VPARENT
543Used to point to the parent type of the vtable if there is one, else it
544is just the type associated with the vtable. Because of the sharing of
545virtual function tables that goes on, this slot is not very useful, and
546is in fact, not used in the compiler at all. It can be removed.
547
548What things can this be used on:
549
550 VAR_DECLs that are virtual function tables
551
552Has values of:
553
554 RECORD_TYPEs maybe UNION_TYPEs
555
556
557@item DECL_FCONTEXT
558Used to find the first baseclass in which this FIELD_DECL is defined.
559See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT.
560
561How it is used:
562
563 Used when writing out debugging information about vfield and
564 vbase decls.
565
566What things can this be used on:
567
568 FIELD_DECLs that are virtual function pointers
569 FIELD_DECLs
570
571
572@item DECL_REFERENCE_SLOT
573Used to hold the initialize for the reference.
574
575What things can this be used on:
576
577 PARM_DECLs and VAR_DECLs that have a reference type
578
579
580@item DECL_VINDEX
581Used for FUNCTION_DECLs in two different ways. Before the structure
582containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a
583FUNCTION_DECL in a base class which is the FUNCTION_DECL which this
584FUNCTION_DECL will replace as a virtual function. When the class is
585laid out, this pointer is changed to an INTEGER_CST node which is
586suitable to find an index into the virtual function table. See
587get_vtable_entry as to how one can find the right index into the virtual
588function table. The first index 0, of a virtual function table it not
589used in the normal way, so the first real index is 1.
590
591DECL_VINDEX may be a TREE_LIST, that would seem to be a list of
592overridden FUNCTION_DECLs. add_virtual_function has code to deal with
593this when it uses the variable base_fndecl_list, but it would seem that
594somehow, it is possible for the TREE_LIST to pursist until method_call,
595and it should not.
596
597
598What things can this be used on:
599
600 FUNCTION_DECLs
601
602
603@item DECL_SOURCE_FILE
604Identifies what source file a particular declaration was found in.
605
606Has values of:
607
608 "<built-in>" on TYPE_DECLs to mean the typedef is built in
609
610
611@item DECL_SOURCE_LINE
612Identifies what source line number in the source file the declaration
613was found at.
614
615Has values of:
616
617@display
6180 for an undefined label
619
6200 for TYPE_DECLs that are internally generated
621
6220 for FUNCTION_DECLs for functions generated by the compiler
623 (not yet, but should be)
624
6250 for ``magic'' arguments to functions, that the user has no
626 control over
627@end display
628
629
630@item TREE_USED
631
632Has values of:
633
634 0 for unused labels
635
636
637@item TREE_ADDRESSABLE
638A flag that is set for any type that has a constructor.
639
640
641@item TREE_COMPLEXITY
642They seem a kludge way to track recursion, poping, and pushing. They only
643appear in cp-decl.c and cp-decl2.c, so the are a good candidate for
644proper fixing, and removal.
645
646
4dabb379
MS
647@item TREE_HAS_CONSTRUCTOR
648A flag to indicate when a CALL_EXPR represents a call to a constructor.
649If set, we know that the type of the object, is the complete type of the
650object, and that the value returned is nonnull. When used in this
651fashion, it is an optimization. Can also be used on SAVE_EXPRs to
652indicate when they are of fixed type and nonnull. Can also be used on
653INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor.
654
655
8d08fdba
MS
656@item TREE_PRIVATE
657Set for FIELD_DECLs by finish_struct. But not uniformly set.
658
659The following routines do something with PRIVATE access:
660build_method_call, alter_access, finish_struct_methods,
661finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType,
662CWriteUseObject, compute_access, lookup_field, dfs_pushdecl,
663GNU_xref_member, dbxout_type_fields, dbxout_type_method_1
664
665
666@item TREE_PROTECTED
667The following routines do something with PROTECTED access:
668build_method_call, alter_access, finish_struct, convert_to_aggr,
669CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject,
670compute_access, lookup_field, GNU_xref_member, dbxout_type_fields,
671dbxout_type_method_1
672
673
674@item TYPE_BINFO
675Used to get the binfo for the type.
676
677Has values of:
678
679 TREE_VECs that are binfos
680
681What things can this be used on:
682
683 RECORD_TYPEs
684
685
686@item TYPE_BINFO_BASETYPES
687See also BINFO_BASETYPES.
688
689@item TYPE_BINFO_VIRTUALS
690A unique list of functions for the virtual function table. See also
691BINFO_VIRTUALS.
692
693What things can this be used on:
694
695 RECORD_TYPEs
696
697
698@item TYPE_BINFO_VTABLE
699Points to the virtual function table associated with the given type.
700See also BINFO_VTABLE.
701
702What things can this be used on:
703
704 RECORD_TYPEs
705
706Has values of:
707
708 VAR_DECLs that are virtual function tables
709
710
711@item TYPE_NAME
712Names the type.
713
714Has values of:
715
716@display
7170 for things that don't have names.
718should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and
719 ENUM_TYPEs.
720TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but
721 shouldn't be.
722TYPE_DECL for typedefs, unsure why.
723@end display
724
725What things can one use this on:
726
727@display
728TYPE_DECLs
729RECORD_TYPEs
730UNION_TYPEs
731ENUM_TYPEs
732@end display
733
734History:
735
736 It currently points to the TYPE_DECL for RECORD_TYPEs,
737 UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
738
739
740@item TYPE_METHODS
741Synonym for @code{CLASSTYPE_METHOD_VEC}. Chained together with
742@code{TREE_CHAIN}. @file{dbxout.c} uses this to get at the methods of a
743class.
744
745
746@item TYPE_DECL
747Used to represent typedefs, and used to represent bindings layers.
748
749Components:
750
751 DECL_NAME is the name of the typedef. For example, foo would
752 be found in the DECL_NAME slot when @code{typedef int foo;} is
753 seen.
754
755 DECL_SOURCE_LINE identifies what source line number in the
756 source file the declaration was found at. A value of 0
757 indicates that this TYPE_DECL is just an internal binding layer
67cc5fec 758 marker, and does not correspond to a user supplied typedef.
8d08fdba
MS
759
760 DECL_SOURCE_FILE
761
762@item TYPE_FIELDS
763A linked list (via @code{TREE_CHAIN}) of member types of a class. The
764list can contain @code{TYPE_DECL}s, but there can also be other things
765in the list apparently. See also @code{CLASSTYPE_TAGS}.
766
767
768@item TYPE_VIRTUAL_P
769A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is
770a virtual function table or a pointer to one. When used on a
771@code{FUNCTION_DECL}, indicates that it is a virtual function. When
772used on an @code{IDENTIFIER_NODE}, indicates that a function with this
773same name exists and has been declared virtual.
774
775When used on types, it indicates that the type has virtual functions, or
776is derived from one that does.
777
778Not sure if the above about virtual function tables is still true. See
779also info on @code{DECL_VIRTUAL_P}.
780
781What things can this be used on:
782
783 FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
784
785
786@item VF_BASETYPE_VALUE
787Get the associated type from the binfo that caused the given vfield to
788exist. This is the least derived class (the most parent class) that
789needed a virtual function table. It is probably the case that all uses
790of this field are misguided, but they need to be examined on a
791case-by-case basis. See history for more information on why the
792previous statement was made.
793
794Set at @code{finish_base_struct} time.
795
796What things can this be used on:
797
798 TREE_LISTs that are vfields
799
800History:
801
802 This field was used to determine if a virtual function table's
803 slot should be filled in with a certain virtual function, by
804 checking to see if the type returned by VF_BASETYPE_VALUE was a
805 parent of the context in which the old virtual function existed.
806 This incorrectly assumes that a given type _could_ not appear as
807 a parent twice in a given inheritance lattice. For single
808 inheritance, this would in fact work, because a type could not
809 possibly appear more than once in an inheritance lattice, but
810 with multiple inheritance, a type can appear more than once.
811
812
813@item VF_BINFO_VALUE
814Identifies the binfo that caused this vfield to exist. If this vfield
815is from the first direct base class that has a virtual function table,
816then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the
817direct base where the vfield came from. Can use @code{TREE_VIA_VIRTUAL}
818on result to find out if it is a virtual base class. Related to the
819binfo found by
820
821@example
822get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
823@end example
824
825@noindent
826where @samp{t} is the type that has the given vfield.
827
828@example
829get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
830@end example
831
832@noindent
38e01259 833will return the binfo for the given vfield.
8d08fdba
MS
834
835May or may not be set at @code{modify_vtable_entries} time. Set at
836@code{finish_base_struct} time.
837
838What things can this be used on:
839
840 TREE_LISTs that are vfields
841
842
843@item VF_DERIVED_VALUE
844Identifies the type of the most derived class of the vfield, excluding
38e01259 845the class this vfield is for.
8d08fdba
MS
846
847Set at @code{finish_base_struct} time.
848
849What things can this be used on:
850
851 TREE_LISTs that are vfields
852
853
854@item VF_NORMAL_VALUE
855Identifies the type of the most derived class of the vfield, including
856the class this vfield is for.
857
858Set at @code{finish_base_struct} time.
859
860What things can this be used on:
861
862 TREE_LISTs that are vfields
863
864
865@item WRITABLE_VTABLES
866This is a option that can be defined when building the compiler, that
867will cause the compiler to output vtables into the data segment so that
868the vtables maybe written. This is undefined by default, because
869normally the vtables should be unwritable. People that implement object
870I/O facilities may, or people that want to change the dynamic type of
871objects may want to have the vtables writable. Another way of achieving
872this would be to make a copy of the vtable into writable memory, but the
873drawback there is that that method only changes the type for one object.
874
875@end table
876
877@node Typical Behavior, Coding Conventions, Macros, Top
878@section Typical Behavior
879
880@cindex parse errors
881
882Whenever seemingly normal code fails with errors like
883@code{syntax error at `\@{'}, it's highly likely that grokdeclarator is
884returning a NULL_TREE for whatever reason.
885
886@node Coding Conventions, Templates, Typical Behavior, Top
887@section Coding Conventions
888
889It should never be that case that trees are modified in-place by the
890back-end, @emph{unless} it is guaranteed that the semantics are the same
891no matter how shared the tree structure is. @file{fold-const.c} still
892has some cases where this is not true, but rms hypothesizes that this
893will never be a problem.
894
895@node Templates, Access Control, Coding Conventions, Top
896@section Templates
897
f30432d7
MS
898A template is represented by a @code{TEMPLATE_DECL}. The specific
899fields used are:
8d08fdba 900
f30432d7
MS
901@table @code
902@item DECL_TEMPLATE_RESULT
903The generic decl on which instantiations are based. This looks just
904like any other decl.
8d08fdba 905
f30432d7
MS
906@item DECL_TEMPLATE_PARMS
907The parameters to this template.
908@end table
8d08fdba 909
f30432d7
MS
910The generic decl is parsed as much like any other decl as possible,
911given the parameterization. The template decl is not built up until the
912generic decl has been completed. For template classes, a template decl
913is generated for each member function and static data member, as well.
8d08fdba 914
f30432d7
MS
915Template members of template classes are represented by a TEMPLATE_DECL
916for the class' parameters around another TEMPLATE_DECL for the member's
917parameters.
918
919All declarations that are instantiations or specializations of templates
920refer to their template and parameters through DECL_TEMPLATE_INFO.
921
922How should I handle parsing member functions with the proper param
923decls? Set them up again or try to use the same ones? Currently we do
924the former. We can probably do this without any extra machinery in
925store_pending_inline, by deducing the parameters from the decl in
926do_pending_inlines. PRE_PARSED_TEMPLATE_DECL?
927
928If a base is a parm, we can't check anything about it. If a base is not
929a parm, we need to check it for name binding. Do finish_base_struct if
930no bases are parameterized (only if none, including indirect, are
931parms). Nah, don't bother trying to do any of this until instantiation
932-- we only need to do name binding in advance.
933
934Always set up method vec and fields, inc. synthesized methods. Really?
935We can't know the types of the copy folks, or whether we need a
936destructor, or can have a default ctor, until we know our bases and
937fields. Otherwise, we can assume and fix ourselves later. Hopefully.
8d08fdba
MS
938
939@node Access Control, Error Reporting, Templates, Top
940@section Access Control
941The function compute_access returns one of three values:
942
943@table @code
944@item access_public
945means that the field can be accessed by the current lexical scope.
946
947@item access_protected
948means that the field cannot be accessed by the current lexical scope
949because it is protected.
950
951@item access_private
952means that the field cannot be accessed by the current lexical scope
953because it is private.
954@end table
955
956DECL_ACCESS is used for access declarations; alter_access creates a list
957of types and accesses for a given decl.
958
959Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return
960codes of compute_access and were used as a cache for compute_access.
961Now they are not used at all.
962
963TREE_PROTECTED and TREE_PRIVATE are used to record the access levels
964granted by the containing class. BEWARE: TREE_PUBLIC means something
965completely unrelated to access control!
966
51c184be 967@node Error Reporting, Parser, Access Control, Top
8d08fdba
MS
968@section Error Reporting
969
8d2733ca 970The C++ front-end uses a call-back mechanism to allow functions to print
8d08fdba
MS
971out reasonable strings for types and functions without putting extra
972logic in the functions where errors are found. The interface is through
973the @code{cp_error} function (or @code{cp_warning}, etc.). The
974syntax is exactly like that of @code{error}, except that a few more
975conversions are supported:
976
977@itemize @bullet
978@item
979%C indicates a value of `enum tree_code'.
980@item
981%D indicates a *_DECL node.
982@item
983%E indicates a *_EXPR node.
984@item
985%L indicates a value of `enum languages'.
986@item
987%P indicates the name of a parameter (i.e. "this", "1", "2", ...)
988@item
989%T indicates a *_TYPE node.
990@item
991%O indicates the name of an operator (MODIFY_EXPR -> "operator =").
992
993@end itemize
994
995There is some overlap between these; for instance, any of the node
996options can be used for printing an identifier (though only @code{%D}
997tries to decipher function names).
998
999For a more verbose message (@code{class foo} as opposed to just @code{foo},
1000including the return type for functions), use @code{%#c}.
1001To have the line number on the error message indicate the line of the
1002DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want,
1003use @code{%+D}, or it will default to the first.
1004
51c184be
MS
1005@node Parser, Copying Objects, Error Reporting, Top
1006@section Parser
1007
1008Some comments on the parser:
1009
1010The @code{after_type_declarator} / @code{notype_declarator} hack is
1011necessary in order to allow redeclarations of @code{TYPENAME}s, for
1012instance
1013
1014@example
1015typedef int foo;
1016class A @{
1017 char *foo;
1018@};
1019@end example
1020
1021In the above, the first @code{foo} is parsed as a @code{notype_declarator},
1022and the second as a @code{after_type_declarator}.
1023
1024Ambiguities:
1025
1026There are currently four reduce/reduce ambiguities in the parser. They are:
1027
10281) Between @code{template_parm} and
1029@code{named_class_head_sans_basetype}, for the tokens @code{aggr
1030identifier}. This situation occurs in code looking like
1031
1032@example
1033template <class T> class A @{ @};
1034@end example
1035
1036It is ambiguous whether @code{class T} should be parsed as the
1037declaration of a template type parameter named @code{T} or an unnamed
1038constant parameter of type @code{class T}. Section 14.6, paragraph 3 of
1039the January '94 working paper states that the first interpretation is
a28e3c7f 1040the correct one. This ambiguity results in two reduce/reduce conflicts.
51c184be 1041
a28e3c7f 10422) Between @code{primary} and @code{type_id} for code like @samp{int()}
51c184be
MS
1043in places where both can be accepted, such as the argument to
1044@code{sizeof}. Section 8.1 of the pre-San Diego working paper specifies
1045that these ambiguous constructs will be interpreted as @code{typename}s.
a28e3c7f
MS
1046This ambiguity results in six reduce/reduce conflicts between
1047@samp{absdcl} and @samp{functional_cast}.
51c184be 1048
a28e3c7f
MS
10493) Between @code{functional_cast} and
1050@code{complex_direct_notype_declarator}, for various token strings.
1051This situation occurs in code looking like
51c184be
MS
1052
1053@example
1054int (*a);
1055@end example
1056
1057This code is ambiguous; it could be a declaration of the variable
1058@samp{a} as a pointer to @samp{int}, or it could be a functional cast of
1059@samp{*a} to @samp{int}. Section 6.8 specifies that the former
a28e3c7f
MS
1060interpretation is correct. This ambiguity results in 7 reduce/reduce
1061conflicts. Another aspect of this ambiguity is code like 'int (x[2]);',
1062which is resolved at the '[' and accounts for 6 reduce/reduce conflicts
1063between @samp{direct_notype_declarator} and
1064@samp{primary}/@samp{overqualified_id}. Finally, there are 4 r/r
1065conflicts between @samp{expr_or_declarator} and @samp{primary} over code
1066like 'int (a);', which could probably be resolved but would also
1067probably be more trouble than it's worth. In all, this situation
1068accounts for 17 conflicts. Ack!
1069
1070The second case above is responsible for the failure to parse 'LinppFile
1071ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave
1072Math.h++) as an object declaration, and must be fixed so that it does
1073not resolve until later.
1074
10754) Indirectly between @code{after_type_declarator} and @code{parm}, for
1076type names. This occurs in (as one example) code like
51c184be
MS
1077
1078@example
1079typedef int foo, bar;
1080class A @{
1081 foo (bar);
1082@};
1083@end example
1084
1085What is @code{bar} inside the class definition? We currently interpret
1086it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an
a28e3c7f 1087@code{after_type_declarator}. I believe that xlC is correct, in light
51c184be
MS
1088of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that
1089could possibly be a type name is taken as the @i{decl-specifier-seq} of
1090a @i{declaration}." However, it seems clear that this rule must be
a28e3c7f
MS
1091violated in the case of constructors. This ambiguity accounts for 8
1092conflicts.
51c184be
MS
1093
1094Unlike the others, this ambiguity is not recognized by the Working Paper.
1095
8d2733ca 1096@node Copying Objects, Exception Handling, Parser, Top
51c184be
MS
1097@section Copying Objects
1098
1099The generated copy assignment operator in g++ does not currently do the
1100right thing for multiple inheritance involving virtual bases; it just
1101calls the copy assignment operators for its direct bases. What it
1102should probably do is:
1103
11041) Split up the copy assignment operator for all classes that have
1105vbases into "copy my vbases" and "copy everything else" parts. Or do
1106the trickiness that the constructors do to ensure that vbases don't get
1107initialized by intermediate bases.
1108
11092) Wander through the class lattice, find all vbases for which no
1110intermediate base has a user-defined copy assignment operator, and call
1111their "copy everything else" routines. If not all of my vbases satisfy
1112this criterion, warn, because this may be surprising behavior.
1113
11143) Call the "copy everything else" routine for my direct bases.
1115
1116If we only have one direct base, we can just foist everything off onto
1117them.
1118
1119This issue is currently under discussion in the core reflector
1120(2/28/94).
1121
f0e01782 1122@node Exception Handling, Free Store, Copying Objects, Top
8d2733ca
MS
1123@section Exception Handling
1124
a3b49ccd
MS
1125Note, exception handling in g++ is still under development.
1126
8d2733ca
MS
1127This section describes the mapping of C++ exceptions in the C++
1128front-end, into the back-end exception handling framework.
1129
1130The basic mechanism of exception handling in the back-end is
1131unwind-protect a la elisp. This is a general, robust, and language
1132independent representation for exceptions.
1133
1134The C++ front-end exceptions are mapping into the unwind-protect
1135semantics by the C++ front-end. The mapping is describe below.
1136
e8abc66f
MS
1137When -frtti is used, rtti is used to do exception object type checking,
1138when it isn't used, the encoded name for the type of the object being
1139thrown is used instead. All code that originates exceptions, even code
1140that throws exceptions as a side effect, like dynamic casting, and all
1141code that catches exceptions must be compiled with either -frtti, or
1142-fno-rtti. It is not possible to mix rtti base exception handling
5156628f
MS
1143objects with code that doesn't use rtti. The exceptions to this, are
1144code that doesn't catch or throw exceptions, catch (...), and code that
1145just rethrows an exception.
e8abc66f
MS
1146
1147Currently we use the normal mangling used in building functions names
1148(int's are "i", const char * is PCc) to build the non-rtti base type
1149descriptors for exception handling. These descriptors are just plain
1150NULL terminated strings, and internally they are passed around as char
1151*.
8d2733ca
MS
1152
1153In C++, all cleanups should be protected by exception regions. The
1154region starts just after the reason why the cleanup is created has
1155ended. For example, with an automatic variable, that has a constructor,
1156it would be right after the constructor is run. The region ends just
1157before the finalization is expanded. Since the backend may expand the
1158cleanup multiple times along different paths, once for normal end of the
1159region, once for non-local gotos, once for returns, etc, the backend
1160must take special care to protect the finalization expansion, if the
1161expansion is for any other reason than normal region end, and it is
1162`inline' (it is inside the exception region). The backend can either
1163choose to move them out of line, or it can created an exception region
1164over the finalization to protect it, and in the handler associated with
1165it, it would not run the finalization as it otherwise would have, but
1166rather just rethrow to the outer handler, careful to skip the normal
1167handler for the original region.
1168
1169In Ada, they will use the more runtime intensive approach of having
1170fewer regions, but at the cost of additional work at run time, to keep a
1171list of things that need cleanups. When a variable has finished
1172construction, they add the cleanup to the list, when the come to the end
1173of the lifetime of the variable, the run the list down. If the take a
1174hit before the section finishes normally, they examine the list for
1175actions to perform. I hope they add this logic into the back-end, as it
1176would be nice to get that alternative approach in C++.
1177
a3b49ccd
MS
1178On an rs6000, xlC stores exception objects on that stack, under the try
1179block. When is unwinds down into a handler, the frame pointer is
1180adjusted back to the normal value for the frame in which the handler
1181resides, and the stack pointer is left unchanged from the time at which
db5ae43f 1182the object was thrown. This is so that there is always someplace for
a3b49ccd
MS
1183the exception object, and nothing can overwrite it, once we start
1184throwing. The only bad part, is that the stack remains large.
1185
f30432d7
MS
1186The below points out some things that work in g++'s exception handling.
1187
1188All completely constructed temps and local variables are cleaned up in
1189all unwinded scopes. Completely constructed parts of partially
1190constructed objects are cleaned up. This includes partially built
be99da77 1191arrays. Exception specifications are now handled. Thrown objects are
a50f0918
MS
1192now cleaned up all the time. We can now tell if we have an active
1193exception being thrown or not (__eh_type != 0). We use this to call
1194terminate if someone does a throw; without there being an active
0021b564
JM
1195exception object. uncaught_exception () works. Exception handling
1196should work right if you optimize. Exception handling should work with
1197-fpic or -fPIC.
f30432d7 1198
6060a796
MS
1199The below points out some flaws in g++'s exception handling, as it now
1200stands.
1201
e8abc66f 1202Only exact type matching or reference matching of throw types works when
0021b564
JM
1203-fno-rtti is used. Only works on a SPARC (like Suns) (both -mflat and
1204-mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000,
1205PowerPC, Alpha, mips, VAX, m68k and z8k machines. SPARC v9 may not
1206work. HPPA is mostly done, but throwing between a shared library and
1207user code doesn't yet work. Some targets have support for data-driven
1208unwinding. Partial support is in for all other machines, but a stack
1209unwinder called __unwind_function has to be written, and added to
1210libgcc2 for them. The new EH code doesn't rely upon the
1211__unwind_function for C++ code, instead it creates per function
1212unwinders right inside the function, unfortunately, on many platforms
1213the definition of RETURN_ADDR_RTX in the tm.h file for the machine port
1214is wrong. See below for details on __unwind_function. RTL_EXPRs for EH
1215cond variables for && and || exprs should probably be wrapped in
1216UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved.
f30432d7
MS
1217
1218We only do pointer conversions on exception matching a la 15.3 p2 case
12193: `A handler with type T, const T, T&, or const T& is a match for a
1220throw-expression with an object of type E if [3]T is a pointer type and
1221E is a pointer type that can be converted to T by a standard pointer
1222conversion (_conv.ptr_) not involving conversions to pointers to private
1223or protected base classes.' when -frtti is given.
1224
1225We don't call delete on new expressions that die because the ctor threw
1226an exception. See except/18 for a test case.
1227
122815.2 para 13: The exception being handled should be rethrown if control
1229reaches the end of a handler of the function-try-block of a constructor
1230or destructor, right now, it is not.
1231
123215.2 para 12: If a return statement appears in a handler of
1233function-try-block of a constructor, the program is ill-formed, but this
1234isn't diagnosed.
1235
123615.2 para 11: If the handlers of a function-try-block contain a jump
1237into the body of a constructor or destructor, the program is ill-formed,
1238but this isn't diagnosed.
1239
124015.2 para 9: Check that the fully constructed base classes and members
1241of an object are destroyed before entering the handler of a
1242function-try-block of a constructor or destructor for that object.
1243
1244build_exception_variant should sort the incoming list, so that it
6060a796
MS
1245implements set compares, not exact list equality. Type smashing should
1246smash exception specifications using set union.
1247
be99da77
MS
1248Thrown objects are usually allocated on the heap, in the usual way. If
1249one runs out of heap space, throwing an object will probably never work.
1250This could be relaxed some by passing an __in_chrg parameter to track
1251who has control over the exception object. Thrown objects are not
1252allocated on the heap when they are pointer to object types. We should
1253extend it so that all small (<4*sizeof(void*)) objects are stored
1254directly, instead of allocated on the heap.
8ccc31eb
MS
1255
1256When the backend returns a value, it can create new exception regions
1257that need protecting. The new region should rethrow the object in
1258context of the last associated cleanup that ran to completion.
a3b49ccd 1259
f30432d7
MS
1260The structure of the code that is generated for C++ exception handling
1261code is shown below:
1262
1263@example
1264Ln: throw value;
1265 copy value onto heap
1266 jump throw (Ln, id, address of copy of value on heap)
1267
cffa8729 1268 try @{
f30432d7
MS
1269+Lstart: the start of the main EH region
1270|... ...
1271+Lend: the end of the main EH region
cffa8729 1272 @} catch (T o) @{
f30432d7 1273 ...1
cffa8729 1274 @}
f30432d7
MS
1275Lresume:
1276 nop used to make sure there is something before
1277 the next region ends, if there is one
1278... ...
1279
1280 jump Ldone
1281[
1282Lmainhandler: handler for the region Lstart-Lend
1283 cleanup
1284] zero or more, depending upon automatic vars with dtors
1285+Lpartial:
1286| jump Lover
1287+Lhere:
1288 rethrow (Lhere, same id, same obj);
1289Lterm: handler for the region Lpartial-Lhere
1290 call terminate
1291Lover:
1292[
1293 [
1294 call throw_type_match
cffa8729 1295 if (eq) @{
f30432d7
MS
1296 ] these lines disappear when there is no catch condition
1297+Lsregion2:
1298| ...1
1299| jump Lresume
1300|Lhandler: handler for the region Lsregion2-Leregion2
1301| rethrow (Lresume, same id, same obj);
1302+Leregion2
cffa8729 1303 @}
f30432d7
MS
1304] there are zero or more of these sections, depending upon how many
1305 catch clauses there are
1306----------------------------- expand_end_all_catch --------------------------
1307 here we have fallen off the end of all catch
1308 clauses, so we rethrow to outer
1309 rethrow (Lresume, same id, same obj);
1310----------------------------- expand_end_all_catch --------------------------
1311[
1312L1: maybe throw routine
1313] depending upon if we have expanded it or not
1314Ldone:
1315 ret
1316
1317start_all_catch emits labels: Lresume,
1318
cffa8729 1319@end example
f30432d7 1320
e8abc66f
MS
1321The __unwind_function takes a pointer to the throw handler, and is
1322expected to pop the stack frame that was built to call it, as well as
f30432d7
MS
1323the frame underneath and then jump to the throw handler. It must
1324restore all registers to their proper values as well as all other
1325machine state as determined by the context in which we are unwinding
1326into. The way I normally start is to compile:
1327
1328 void *g;
cffa8729 1329 foo(void* a) @{ g = a; @}
f30432d7
MS
1330
1331with -S, and change the thing that alters the PC (return, or ret
1332usually) to not alter the PC, making sure to leave all other semantics
1333(like adjusting the stack pointer, or frame pointers) in. After that,
1334replicate the prologue once more at the end, again, changing the PC
1335altering instructions, and finally, at the very end, jump to `g'.
1336
1337It takes about a week to write this routine, if someone wants to
1338volunteer to write this routine for any architecture, exception support
1339for that architecture will be added to g++. Please send in those code
1340donations. One other thing that needs to be done, is to double check
1341that __builtin_return_address (0) works.
1342
1343@subsection Specific Targets
e8abc66f 1344
f30432d7
MS
1345For the alpha, the __unwind_function will be something resembling:
1346
1347@example
1348void
1349__unwind_function(void *ptr)
1350@{
1351 /* First frame */
1352 asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */
1353 asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1354
1355 /* Second frame */
1356 asm ("ldq $15, 8($30)"); /* fp */
1357 asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1358
1359 /* Return */
1360 asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */
1361@}
1362@end example
1363
1364@noindent
1365However, there are a few problems preventing it from working. First of
1366all, the gcc-internal function @code{__builtin_return_address} needs to
1367work given an argument of 0 for the alpha. As it stands as of August
136830th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c}
1369will definitely not work on the alpha. Instead, we need to define
1370the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe),
1371@code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new
1372definition for @code{RETURN_ADDR_RTX}.
1373
1374In addition (and more importantly), we need a way to reliably find the
1375frame pointer on the alpha. The use of the value 8 above to restore the
1376frame pointer (register 15) is incorrect. On many systems, the frame
1377pointer is consistently offset to a specific point on the stack. On the
1378alpha, however, the frame pointer is pushed last. First the return
1379address is stored, then any other registers are saved (e.g., @code{s0}),
1380and finally the frame pointer is put in place. So @code{fp} could have
1381an offset of 8, but if the calling function saved any registers at all,
1382they add to the offset.
1383
1384The only places the frame size is noted are with the @samp{.frame}
1385directive, for use by the debugger and the OSF exception handling model
1386(useless to us), and in the initial computation of the new value for
1387@code{sp}, the stack pointer. For example, the function may start with:
1388
1389@example
1390lda $30,-32($30)
1391.frame $15,32,$26,0
1392@end example
1393
1394@noindent
1395The 32 above is exactly the value we need. With this, we can be sure
1396that the frame pointer is stored 8 bytes less---in this case, at 24(sp)).
1397The drawback is that there is no way that I (Brendan) have found to let
1398us discover the size of a previous frame @emph{inside} the definition
1399of @code{__unwind_function}.
1400
1401So to accomplish exception handling support on the alpha, we need two
1402things: first, a way to figure out where the frame pointer was stored,
1403and second, a functional @code{__builtin_return_address} implementation
1404for except.c to be able to use it.
1405
0021b564
JM
1406Or just support DWARF 2 unwind info.
1407
1408@subsection New Backend Exception Support
1409
1410This subsection discusses various aspects of the design of the
1411data-driven model being implemented for the exception handling backend.
1412
1413The goal is to generate enough data during the compilation of user code,
1414such that we can dynamically unwind through functions at run time with a
1415single routine (@code{__throw}) that lives in libgcc.a, built by the
1416compiler, and dispatch into associated exception handlers.
1417
1418This information is generated by the DWARF 2 debugging backend, and
1419includes all of the information __throw needs to unwind an arbitrary
1420frame. It specifies where all of the saved registers and the return
1421address can be found at any point in the function.
1422
1423Major disadvantages when enabling exceptions are:
1424
1425@itemize @bullet
1426@item
1427Code that uses caller saved registers, can't, when flow can be
956d6950 1428transferred into that code from an exception handler. In high performance
0021b564
JM
1429code this should not usually be true, so the effects should be minimal.
1430
1431@end itemize
1432
f30432d7 1433@subsection Backend Exception Support
e8abc66f
MS
1434
1435The backend must be extended to fully support exceptions. Right now
1436there are a few hooks into the alpha exception handling backend that
1437resides in the C++ frontend from that backend that allows exception
1438handling to work in g++. An exception region is a segment of generated
1439code that has a handler associated with it. The exception regions are
1440denoted in the generated code as address ranges denoted by a starting PC
1441value and an ending PC value of the region. Some of the limitations
1442with this scheme are:
1443
1444@itemize @bullet
1445@item
1446The backend replicates insns for such things as loop unrolling and
1447function inlining. Right now, there are no hooks into the frontend's
1448exception handling backend to handle the replication of insns. When
1449replication happens, a new exception region descriptor needs to be
1450generated for the new region.
1451
1452@item
1453The backend expects to be able to rearrange code, for things like jump
1454optimization. Any rearranging of the code needs have exception region
1455descriptors updated appropriately.
1456
1457@item
1458The backend can eliminate dead code. Any associated exception region
1459descriptor that refers to fully contained code that has been eliminated
1460should also be removed, although not doing this is harmless in terms of
1461semantics.
1462
cffa8729 1463@end itemize
e8abc66f
MS
1464
1465The above is not meant to be exhaustive, but does include all things I
1466have thought of so far. I am sure other limitations exist.
1467
f30432d7
MS
1468Below are some notes on the migration of the exception handling code
1469backend from the C++ frontend to the backend.
1470
1471NOTEs are to be used to denote the start of an exception region, and the
1472end of the region. I presume that the interface used to generate these
1473notes in the backend would be two functions, start_exception_region and
1474end_exception_region (or something like that). The frontends are
1475required to call them in pairs. When marking the end of a region, an
1476argument can be passed to indicate the handler for the marked region.
1477This can be passed in many ways, currently a tree is used. Another
1478possibility would be insns for the handler, or a label that denotes a
38e01259 1479handler. I have a feeling insns might be the best way to pass it.
f30432d7 1480Semantics are, if an exception is thrown inside the region, control is
956d6950 1481transferred unconditionally to the handler. If control passes through
f30432d7
MS
1482the handler, then the backend is to rethrow the exception, in the
1483context of the end of the original region. The handler is protected by
1484the conventional mechanisms; it is the frontend's responsibility to
1485protect the handler, if special semantics are required.
1486
1487This is a very low level view, and it would be nice is the backend
1488supported a somewhat higher level view in addition to this view. This
1489higher level could include source line number, name of the source file,
1490name of the language that threw the exception and possibly the name of
1491the exception. Kenner may want to rope you into doing more than just
1492the basics required by C++. You will have to resolve this. He may want
1493you to do support for non-local gotos, first scan for exception handler,
1494if none is found, allow the debugger to be entered, without any cleanups
1495being done. To do this, the backend would have to know the difference
1496between a cleanup-rethrower, and a real handler, if would also have to
1497have a way to know if a handler `matches' a thrown exception, and this
1498is frontend specific.
1499
f30432d7
MS
1500The stack unwinder is one of the hardest parts to do. It is highly
1501machine dependent. The form that kenner seems to like was a couple of
1502macros, that would do the machine dependent grunt work. One preexisting
1503function that might be of some use is __builtin_return_address (). One
1504macro he seemed to want was __builtin_return_address, and the other
1505would do the hard work of fixing up the registers, adjusting the stack
1506pointer, frame pointer, arg pointer and so on.
1507
f30432d7 1508
42976354 1509@node Free Store, Mangling, Exception Handling, Top
f0e01782
MS
1510@section Free Store
1511
e9f32eb5
MS
1512@code{operator new []} adds a magic cookie to the beginning of arrays
1513for which the number of elements will be needed by @code{operator delete
1514[]}. These are arrays of objects with destructors and arrays of objects
1515that define @code{operator delete []} with the optional size_t argument.
1516This cookie can be examined from a program as follows:
f0e01782
MS
1517
1518@example
1519typedef unsigned long size_t;
1520extern "C" int printf (const char *, ...);
1521
1522size_t nelts (void *p)
1523@{
1524 struct cookie @{
1525 size_t nelts __attribute__ ((aligned (sizeof (double))));
1526 @};
1527
1528 cookie *cp = (cookie *)p;
1529 --cp;
1530
1531 return cp->nelts;
1532@}
1533
1534struct A @{
1535 ~A() @{ @}
1536@};
1537
1538main()
1539@{
1540 A *ap = new A[3];
1541 printf ("%ld\n", nelts (ap));
1542@}
1543@end example
1544
a5894242
MS
1545@section Linkage
1546The linkage code in g++ is horribly twisted in order to meet two design goals:
1547
15481) Avoid unnecessary emission of inlines and vtables.
1549
15502) Support pedantic assemblers like the one in AIX.
1551
1552To meet the first goal, we defer emission of inlines and vtables until
1553the end of the translation unit, where we can decide whether or not they
1554are needed, and how to emit them if they are.
42976354
BK
1555
1556@node Mangling, Concept Index, Free Store, Top
1557@section Function name mangling for C++ and Java
1558
1559Both C++ and Jave provide overloaded function and methods,
1560which are methods with the same types but different parameter lists.
1561Selecting the correct version is done at compile time.
1562Though the overloaded functions have the same name in the source code,
1563they need to be translated into different assembler-level names,
1564since typical assemblers and linkers cannot handle overloading.
1565This process of encoding the parameter types with the method name
1566into a unique name is called @dfn{name mangling}. The inverse
1567process is called @dfn{demangling}.
1568
1569It is convenient that C++ and Java use compatible mangling schemes,
1570since the makes life easier for tools such as gdb, and it eases
1571integration between C++ and Java.
1572
1573Note there is also a standard "Jave Native Interface" (JNI) which
1574implements a different calling convention, and uses a different
1575mangling scheme. The JNI is a rather abstract ABI so Java can call methods
1576written in C or C++;
1577we are concerned here about a lower-level interface primarily
1578intended for methods written in Java, but that can also be used for C++
1579(and less easily C).
1580
5427d758
MT
1581Note that on systems that follow BSD tradition, a C identifier @code{var}
1582would get "mangled" into the assembler name @samp{_var}. On such
1583systems, all other mangled names are also prefixed by a @samp{_}
1584which is not shown in the following examples.
1585
42976354
BK
1586@subsection Method name mangling
1587
1588C++ mangles a method by emitting the function name, followed by @code{__},
1589followed by encodings of any method qualifiers (such as @code{const}),
1590followed by the mangling of the method's class,
1591followed by the mangling of the parameters, in order.
1592
1593For example @code{Foo::bar(int, long) const} is mangled
1594as @samp{bar__C3Fooil}.
1595
1596For a constructor, the method name is left out.
1597That is @code{Foo::Foo(int, long) const} is mangled
1598as @samp{__C3Fooil}.
1599
1600GNU Java does the same.
1601
1602@subsection Primitive types
1603
1604The C++ types @code{int}, @code{long}, @code{short}, @code{char},
1605and @code{long long} are mangled as @samp{i}, @samp{l},
1606@samp{s}, @samp{c}, and @samp{x}, respectively.
1607The corresponding unsigned types have @samp{U} prefixed
1608to the mangling. The type @code{signed char} is mangled @samp{Sc}.
1609
1610The C++ and Java floating-point types @code{float} and @code{double}
1611are mangled as @samp{f} and @samp{d} respectively.
1612
1613The C++ @code{bool} type and the Java @code{boolean} type are
1614mangled as @samp{b}.
1615
1616The C++ @code{wchar_t} and the Java @code{char} types are
1617mangled as @samp{w}.
1618
1619The Java integral types @code{byte}, @code{short}, @code{int}
1620and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i},
1621and @samp{x}, respectively.
1622
1623C++ code that has included @code{javatypes.h} will mangle
1624the typedefs @code{jbyte}, @code{jshort}, @code{jint}
1625and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i},
1626and @samp{x}. (This has not been implemented yet.)
1627
1628@subsection Mangling of simple names
1629
1630A simple class, package, template, or namespace name is
1631encoded as the number of characters in the name, followed by
1632the actual characters. Thus the class @code{Foo}
1633is encoded as @samp{3Foo}.
1634
1635If any of the characters in the name are not alphanumeric
1636(i.e not one of the standard ASCII letters, digits, or '_'),
1637or the initial character is a digit, then the name is
1638mangled as a sequence of encoded Unicode letters.
1639A Unicode encoding starts with a @samp{U} to indicate
1640that Unicode escapes are used, followed by the number of
1641bytes used by the Unicode encoding, followed by the bytes
1642representing the encoding. ASSCI letters and
1643non-initial digits are encoded without change. However, all
1644other characters (including underscore and initial digits) are
1645translated into a sequence starting with an underscore,
1646followed by the big-endian 4-hex-digit lower-case encoding of the character.
1647
1648If a method name contains Unicode-escaped characters, the
1649entire mangled method name is followed by a @samp{U}.
1650
1651For example, the method @code{X\u0319::M\u002B(int)} is encoded as
1652@samp{M_002b__U6X_0319iU}.
1653
5427d758 1654
42976354
BK
1655@subsection Pointer and reference types
1656
1657A C++ pointer type is mangled as @samp{P} followed by the
1658mangling of the type pointed to.
1659
1660A C++ reference type as mangled as @samp{R} followed by the
1661mangling of the type referenced.
1662
1663A Java object reference type is equivalent
1664to a C++ pointer parameter, so we mangle such an parameter type
1665as @samp{P} followed by the mangling of the class name.
1666
61fbdb55
AM
1667@subsection Squangled type compression
1668
1669Squangling (enabled with the @samp{-fsquangle} option), utilizes
1670the @samp{B} code to indicate reuse of a previously
1671seen type within an indentifier. Types are recognized in a left to
1672right manner and given increasing values, which are
1673appended to the code in the standard manner. Ie, multiple digit numbers
1674are delimited by @samp{_} characters. A type is considered to be any
1675non primitive type, regardless of whether its a parameter, template
1676parameter, or entire template. Certain codes are considered modifiers
1677of a type, and are not included as part of the type. These are the
1678@samp{C}, @samp{V}, @samp{P}, @samp{A}, @samp{R}, and @samp{U} codes,
1679denoting constant, volatile, pointer, array, reference, and unsigned.
1680These codes may precede a @samp{B} type in order to make the required
1681modifications to the type.
1682
1683For example:
1684@example
1685template <class T> class class1 @{ @};
1686
1687template <class T> class class2 @{ @};
1688
1689class class3 @{ @};
1690
1691int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @}
1692
1693 B0 -> class2<class1<class3>
1694 B1 -> class1<class3>
1695 B2 -> class3
1696@end example
1697Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}.
1698The int parameter is a basic type, and does not receive a B encoding...
1699
42976354
BK
1700@subsection Qualified names
1701
1702Both C++ and Java allow a class to be lexically nested inside another
1703class. C++ also supports namespaces (not yet implemented by G++).
1704Java also supports packages.
1705
1706These are all mangled the same way: First the letter @samp{Q}
1707indicates that we are emitting a qualified name.
1708That is followed by the number of parts in the qualified name.
1709If that number is 9 or less, it is emitted with no delimiters.
1710Otherwise, an underscore is written before and after the count.
1711Then follows each part of the qualified name, as described above.
1712
1713For example @code{Foo::\u0319::Bar} is encoded as
1714@samp{Q33FooU5_03193Bar}.
1715
61fbdb55
AM
1716Squangling utilizes the the letter @samp{K} to indicate a
1717remembered portion of a qualified name. As qualified names are processed
1718for an identifier, the names are numbered and remembered in a
1719manner similar to the @samp{B} type compression code.
1720Names are recognized left to right, and given increasing values, which are
1721appended to the code in the standard manner. ie, multiple digit numbers
1722are delimited by @samp{_} characters.
1723
1724For example
1725@example
1726class Andrew
1727@{
1728 class WasHere
1729 @{
1730 class AndHereToo
1731 @{
1732 @};
1733 @};
1734@};
1735
1736f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @}
1737
1738 K0 -> Andrew
1739 K1 -> Andrew::WasHere
1740 K2 -> Andrew::WasHere::AndHereToo
1741@end example
1742Function @samp{f()} would be mangled as :
1743@samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo}
1744
1745There are some occasions when either a @samp{B} or @samp{K} code could
1746be chosen, preference is always given to the @samp{B} code. Ie, the example
1747in the section on @samp{B} mangling could have used a @samp{K} code
1748instead of @samp{B2}.
1749
42976354
BK
1750@subsection Templates
1751
386b8a85 1752A class template instantiation is encoded as the letter @samp{t},
42976354
BK
1753followed by the encoding of the template name, followed
1754the number of template parameters, followed by encoding of the template
1755parameters. If a template parameter is a type, it is written
1756as a @samp{Z} followed by the encoding of the type.
1757
386b8a85
JM
1758A function template specialization (either an instantiation or an
1759explicit specialization) is encoded by an @samp{H} followed by the
f84b4be9
JM
1760encoding of the template parameters, as described above, followed by an
1761@samp{_}, the encoding of the argument types to the template function
1762(not the specialization), another @samp{_}, and the return type. (Like
1763the argument types, the return type is the return type of the function
386b8a85
JM
1764template, not the specialization.) Template parameters in the argument
1765and return types are encoded by an @samp{X} for type parameters, or a
f84b4be9
JM
1766@samp{Y} for constant parameters, an index indicating their position
1767in the template parameter list declaration, and their template depth.
386b8a85 1768
42976354
BK
1769@subsection Arrays
1770
1771C++ array types are mangled by emitting @samp{A}, followed by
1772the length of the array, followed by an @samp{_}, followed by
1773the mangling of the element type. Of course, normally
1774array parameter types decay into a pointer types, so you
1775don't see this.
1776
1777Java arrays are objects. A Java type @code{T[]} is mangled
1778as if it were the C++ type @code{JArray<T>}.
1779For example @code{java.lang.String[]} is encoded as
1780@samp{Pt6JArray1ZPQ34java4lang6String}.
1781
5427d758
MT
1782@subsection Static fields
1783
1784Both C++ and Java classes can have static fields.
1785These are allocated statically, and are shared among all instances.
1786
1787The mangling starts with a prefix (@samp{_} in most systems), which is
1788followed by the mangling
1789of the class name, followed by the "joiner" and finally the field name.
1790The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special
1791separator character. For historical reasons (and idiosyncracies
1792of assembler syntax) it can @samp{$} or @samp{.} (or even
1793@samp{_} on a few systems). If the joiner is @samp{_} then the prefix
1794is @samp{__static_} instead of just @samp{_}.
1795
1796For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax)
1797would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var}
1798(or rarely @samp{__static_Q23Foo3Bar_var}).
1799
1800If the name of a static variable needs Unicode escapes,
1801the Unicode indicator @samp{U} comes before the "joiner".
1802This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}.
1803
42976354
BK
1804@subsection Table of demangling code characters
1805
1806The following special characters are used in mangling:
1807
1808@table @samp
1809@item A
1810Indicates a C++ array type.
1811
1812@item b
1813Encodes the C++ @code{bool} type,
1814and the Java @code{boolean} type.
1815
ff29fd00 1816@item B
61fbdb55 1817Used for squangling. Similar in concept to the 'T' non-squangled code.
ff29fd00 1818
42976354
BK
1819@item c
1820Encodes the C++ @code{char} type, and the Java @code{byte} type.
1821
1822@item C
1823A modifier to indicate a @code{const} type.
1824Also used to indicate a @code{const} member function
1825(in which cases it precedes the encoding of the method's class).
1826
1827@item d
1828Encodes the C++ and Java @code{double} types.
1829
1830@item e
1831Indicates extra unknown arguments @code{...}.
1832
ff29fd00
MM
1833@item E
1834Indicates the opening parenthesis of an expression.
1835
42976354
BK
1836@item f
1837Encodes the C++ and Java @code{float} types.
1838
1839@item F
1840Used to indicate a function type.
1841
386b8a85
JM
1842@item H
1843Used to indicate a template function.
1844
42976354
BK
1845@item i
1846Encodes the C++ and Java @code{int} types.
1847
1848@item J
1849Indicates a complex type.
1850
ff29fd00 1851@item K
61fbdb55 1852Used by squangling to compress qualified names.
ff29fd00 1853
42976354
BK
1854@item l
1855Encodes the C++ @code{long} type.
1856
1857@item P
1858Indicates a pointer type. Followed by the type pointed to.
1859
1860@item Q
1861Used to mangle qualified names, which arise from nested classes.
1862Should also be used for namespaces (?).
1863In Java used to mangle package-qualified names, and inner classes.
1864
1865@item r
1866Encodes the GNU C++ @code{long double} type.
1867
1868@item R
1869Indicates a reference type. Followed by the referenced type.
1870
1871@item s
1872Encodes the C++ and java @code{short} types.
1873
1874@item S
1875A modifier that indicates that the following integer type is signed.
1876Only used with @code{char}.
1877
1878Also used as a modifier to indicate a static member function.
1879
1880@item t
1881Indicates a template instantiation.
1882
1883@item T
1884A back reference to a previously seen type.
1885
1886@item U
1887A modifier that indicates that the following integer type is unsigned.
1888Also used to indicate that the following class or namespace name
1889is encoded using Unicode-mangling.
1890
1891@item v
1892Encodes the C++ and Java @code{void} types.
1893
1894@item V
1895A modified for a @code{const} type or method.
1896
1897@item w
1898Encodes the C++ @code{wchar_t} type, and the Java @code{char} types.
1899
ff29fd00
MM
1900@item W
1901Indicates the closing parenthesis of an expression.
1902
42976354
BK
1903@item x
1904Encodes the GNU C++ @code{long long} type, and the Java @code{long} type.
1905
386b8a85
JM
1906@item X
1907Encodes a template type parameter, when part of a function type.
1908
1909@item Y
1910Encodes a template constant parameter, when part of a function type.
1911
42976354
BK
1912@item Z
1913Used for template type parameters.
1914
1915@end table
1916
1917The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p}
1918also seem to be used for obscure purposes ...
1919
1920@node Concept Index, , Mangling, Top
a5894242 1921
8d08fdba
MS
1922@section Concept Index
1923
1924@printindex cp
1925
1926@bye