]>
Commit | Line | Data |
---|---|---|
8d08fdba MS |
1 | \input texinfo @c -*-texinfo-*- |
2 | @c %**start of header | |
3 | @setfilename g++int.info | |
4 | @settitle G++ internals | |
5 | @setchapternewpage odd | |
6 | @c %**end of header | |
7 | ||
8 | @node Top, Limitations of g++, (dir), (dir) | |
9 | @chapter Internal Architecture of the Compiler | |
10 | ||
8d2733ca | 11 | This is meant to describe the C++ front-end for gcc in detail. |
4c5f3fcd | 12 | Questions and comments to Benjamin Kosnik @code{<bkoz@@cygnus.com>}. |
8d08fdba MS |
13 | |
14 | @menu | |
15 | * Limitations of g++:: | |
16 | * Routines:: | |
17 | * Implementation Specifics:: | |
18 | * Glossary:: | |
19 | * Macros:: | |
20 | * Typical Behavior:: | |
21 | * Coding Conventions:: | |
22 | * Templates:: | |
23 | * Access Control:: | |
24 | * Error Reporting:: | |
51c184be MS |
25 | * Parser:: |
26 | * Copying Objects:: | |
f0e01782 MS |
27 | * Exception Handling:: |
28 | * Free Store:: | |
42976354 | 29 | * Mangling:: Function name mangling for C++ and Java |
8d08fdba MS |
30 | * Concept Index:: |
31 | @end menu | |
32 | ||
33 | @node Limitations of g++, Routines, Top, Top | |
34 | @section Limitations of g++ | |
35 | ||
36 | @itemize @bullet | |
37 | @item | |
38 | Limitations on input source code: 240 nesting levels with the parser | |
39 | stacksize (YYSTACKSIZE) set to 500 (the default), and requires around | |
40 | 16.4k swap space per nesting level. The parser needs about 2.09 * | |
41 | number of nesting levels worth of stackspace. | |
42 | ||
43 | @cindex pushdecl_class_level | |
44 | @item | |
45 | I suspect there are other uses of pushdecl_class_level that do not call | |
46 | set_identifier_type_value in tandem with the call to | |
47 | pushdecl_class_level. It would seem to be an omission. | |
48 | ||
8d08fdba MS |
49 | @cindex access checking |
50 | @item | |
f0e01782 | 51 | Access checking is unimplemented for nested types. |
8d08fdba MS |
52 | |
53 | @cindex @code{volatile} | |
54 | @item | |
55 | @code{volatile} is not implemented in general. | |
56 | ||
8d08fdba MS |
57 | @end itemize |
58 | ||
59 | @node Routines, Implementation Specifics, Limitations of g++, Top | |
60 | @section Routines | |
61 | ||
62 | This section describes some of the routines used in the C++ front-end. | |
63 | ||
64 | @code{build_vtable} and @code{prepare_fresh_vtable} is used only within | |
65 | the @file{cp-class.c} file, and only in @code{finish_struct} and | |
66 | @code{modify_vtable_entries}. | |
67 | ||
68 | @code{build_vtable}, @code{prepare_fresh_vtable}, and | |
69 | @code{finish_struct} are the only routines that set @code{DECL_VPARENT}. | |
70 | ||
71 | @code{finish_struct} can steal the virtual function table from parents, | |
72 | this prohibits related_vslot from working. When finish_struct steals, | |
73 | we know that | |
74 | ||
75 | @example | |
76 | get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0) | |
77 | @end example | |
78 | ||
79 | @noindent | |
80 | will get the related binfo. | |
81 | ||
82 | @code{layout_basetypes} does something with the VIRTUALS. | |
83 | ||
84 | Supposedly (according to Tiemann) most of the breadth first searching | |
85 | done, like in @code{get_base_distance} and in @code{get_binfo} was not | |
86 | because of any design decision. I have since found out the at least one | |
87 | part of the compiler needs the notion of depth first binfo searching, I | |
88 | am going to try and convert the whole thing, it should just work. The | |
89 | term left-most refers to the depth first left-most node. It uses | |
90 | @code{MAIN_VARIANT == type} as the condition to get left-most, because | |
91 | the things that have @code{BINFO_OFFSET}s of zero are shared and will | |
92 | have themselves as their own @code{MAIN_VARIANT}s. The non-shared right | |
93 | ones, are copies of the left-most one, hence if it is its own | |
6de129de | 94 | @code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is |
8d08fdba MS |
95 | a non-left-most one. |
96 | ||
97 | @code{get_base_distance}'s path and distance matters in its use in: | |
98 | ||
99 | @itemize @bullet | |
100 | @item | |
101 | @code{prepare_fresh_vtable} (the code is probably wrong) | |
102 | @item | |
103 | @code{init_vfields} Depends upon distance probably in a safe way, | |
104 | build_offset_ref might use partial paths to do further lookups, | |
105 | hack_identifier is probably not properly checking access. | |
106 | ||
107 | @item | |
108 | @code{get_first_matching_virtual} probably should check for | |
109 | @code{get_base_distance} returning -2. | |
110 | ||
111 | @item | |
112 | @code{resolve_offset_ref} should be called in a more deterministic | |
113 | manner. Right now, it is called in some random contexts, like for | |
114 | arguments at @code{build_method_call} time, @code{default_conversion} | |
115 | time, @code{convert_arguments} time, @code{build_unary_op} time, | |
116 | @code{build_c_cast} time, @code{build_modify_expr} time, | |
117 | @code{convert_for_assignment} time, and | |
118 | @code{convert_for_initialization} time. | |
119 | ||
120 | But, there are still more contexts it needs to be called in, one was the | |
121 | ever simple: | |
122 | ||
123 | @example | |
124 | if (obj.*pmi != 7) | |
125 | @dots{} | |
126 | @end example | |
127 | ||
128 | Seems that the problems were due to the fact that @code{TREE_TYPE} of | |
129 | the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type | |
130 | of the referent (like @code{INTEGER_TYPE}). This problem was fixed by | |
131 | changing @code{default_conversion} to check @code{TREE_CODE (x)}, | |
132 | instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it | |
133 | was @code{OFFSET_TYPE}. | |
134 | ||
135 | @end itemize | |
136 | ||
137 | @node Implementation Specifics, Glossary, Routines, Top | |
138 | @section Implementation Specifics | |
139 | ||
140 | @itemize @bullet | |
141 | @item Explicit Initialization | |
142 | ||
143 | The global list @code{current_member_init_list} contains the list of | |
144 | mem-initializers specified in a constructor declaration. For example: | |
145 | ||
146 | @example | |
147 | foo::foo() : a(1), b(2) @{@} | |
148 | @end example | |
149 | ||
150 | @noindent | |
151 | will initialize @samp{a} with 1 and @samp{b} with 2. | |
152 | @code{expand_member_init} places each initialization (a with 1) on the | |
153 | global list. Then, when the fndecl is being processed, | |
154 | @code{emit_base_init} runs down the list, initializing them. It used to | |
155 | be the case that g++ first ran down @code{current_member_init_list}, | |
156 | then ran down the list of members initializing the ones that weren't | |
157 | explicitly initialized. Things were rewritten to perform the | |
158 | initializations in order of declaration in the class. So, for the above | |
159 | example, @samp{a} and @samp{b} will be initialized in the order that | |
160 | they were declared: | |
161 | ||
162 | @example | |
163 | class foo @{ public: int b; int a; foo (); @}; | |
164 | @end example | |
165 | ||
166 | @noindent | |
167 | Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be | |
168 | initialized with 1, regardless of how they're listed in the mem-initializer. | |
169 | ||
170 | @item Argument Matching | |
171 | ||
172 | In early 1993, the argument matching scheme in @sc{gnu} C++ changed | |
173 | significantly. The original code was completely replaced with a new | |
174 | method that will, hopefully, be easier to understand and make fixing | |
175 | specific cases much easier. | |
176 | ||
177 | The @samp{-fansi-overloading} option is used to enable the new code; at | |
178 | some point in the future, it will become the default behavior of the | |
179 | compiler. | |
180 | ||
181 | The file @file{cp-call.c} contains all of the new work, in the functions | |
182 | @code{rank_for_overload}, @code{compute_harshness}, | |
183 | @code{compute_conversion_costs}, and @code{ideal_candidate}. | |
184 | ||
185 | Instead of using obscure numerical values, the quality of an argument | |
186 | match is now represented by clear, individual codes. The new data | |
187 | structure @code{struct harshness} (it used to be an @code{unsigned} | |
188 | number) contains: | |
189 | ||
190 | @enumerate a | |
191 | @item the @samp{code} field, to signify what was involved in matching two | |
192 | arguments; | |
193 | @item the @samp{distance} field, used in situations where inheritance | |
194 | decides which function should be called (one is ``closer'' than | |
195 | another); | |
196 | @item and the @samp{int_penalty} field, used by some codes as a tie-breaker. | |
197 | @end enumerate | |
198 | ||
199 | The @samp{code} field is a number with a given bit set for each type of | |
200 | code, OR'd together. The new codes are: | |
201 | ||
202 | @itemize @bullet | |
203 | @item @code{EVIL_CODE} | |
204 | The argument was not a permissible match. | |
205 | ||
206 | @item @code{CONST_CODE} | |
207 | Currently, this is only used by @code{compute_conversion_costs}, to | |
208 | distinguish when a non-@code{const} member function is called from a | |
209 | @code{const} member function. | |
210 | ||
211 | @item @code{ELLIPSIS_CODE} | |
212 | A match against an ellipsis @samp{...} is considered worse than all others. | |
213 | ||
214 | @item @code{USER_CODE} | |
215 | Used for a match involving a user-defined conversion. | |
216 | ||
217 | @item @code{STD_CODE} | |
218 | A match involving a standard conversion. | |
219 | ||
220 | @item @code{PROMO_CODE} | |
221 | A match involving an integral promotion. For these, the | |
222 | @code{int_penalty} field is used to handle the ARM's rule (XXX cite) | |
223 | that a smaller @code{unsigned} type should promote to a @code{int}, not | |
224 | to an @code{unsigned int}. | |
225 | ||
226 | @item @code{QUAL_CODE} | |
227 | Used to mark use of qualifiers like @code{const} and @code{volatile}. | |
228 | ||
229 | @item @code{TRIVIAL_CODE} | |
230 | Used for trivial conversions. The @samp{int_penalty} field is used by | |
231 | @code{convert_harshness} to communicate further penalty information back | |
232 | to @code{build_overload_call_real} when deciding which function should | |
233 | be call. | |
234 | @end itemize | |
235 | ||
236 | The functions @code{convert_to_aggr} and @code{build_method_call} use | |
237 | @code{compute_conversion_costs} to rate each argument's suitability for | |
238 | a given candidate function (that's how we get the list of candidates for | |
239 | @code{ideal_candidate}). | |
240 | ||
e050253a BK |
241 | @item The Explicit Keyword |
242 | ||
243 | The use of @code{explicit} on a constructor is used by @code{grokdeclarator} | |
244 | to set the field @code{DECL_NONCONVERTING_P}. That value is used by | |
245 | @code{build_method_call} and @code{build_user_type_conversion_1} to decide | |
246 | if a particular constructor should be used as a candidate for conversions. | |
247 | ||
8d08fdba MS |
248 | @end itemize |
249 | ||
250 | @node Glossary, Macros, Implementation Specifics, Top | |
251 | @section Glossary | |
252 | ||
253 | @table @r | |
254 | @item binfo | |
255 | The main data structure in the compiler used to represent the | |
256 | inheritance relationships between classes. The data in the binfo can be | |
257 | accessed by the BINFO_ accessor macros. | |
258 | ||
259 | @item vtable | |
260 | @itemx virtual function table | |
261 | ||
262 | The virtual function table holds information used in virtual function | |
263 | dispatching. In the compiler, they are usually referred to as vtables, | |
264 | or vtbls. The first index is not used in the normal way, I believe it | |
265 | is probably used for the virtual destructor. | |
266 | ||
267 | @item vfield | |
268 | ||
269 | vfields can be thought of as the base information needed to build | |
270 | vtables. For every vtable that exists for a class, there is a vfield. | |
271 | See also vtable and virtual function table pointer. When a type is used | |
272 | as a base class to another type, the virtual function table for the | |
273 | derived class can be based upon the vtable for the base class, just | |
274 | extended to include the additional virtual methods declared in the | |
275 | derived class. The virtual function table from a virtual base class is | |
276 | never reused in a derived class. @code{is_normal} depends upon this. | |
277 | ||
278 | @item virtual function table pointer | |
279 | ||
280 | These are @code{FIELD_DECL}s that are pointer types that point to | |
281 | vtables. See also vtable and vfield. | |
282 | @end table | |
283 | ||
284 | @node Macros, Typical Behavior, Glossary, Top | |
285 | @section Macros | |
286 | ||
287 | This section describes some of the macros used on trees. The list | |
288 | should be alphabetical. Eventually all macros should be documented | |
e349ee73 | 289 | here. |
8d08fdba MS |
290 | |
291 | @table @code | |
292 | @item BINFO_BASETYPES | |
293 | A vector of additional binfos for the types inherited by this basetype. | |
294 | The binfos are fully unshared (except for virtual bases, in which | |
295 | case the binfo structure is shared). | |
296 | ||
297 | If this basetype describes type D as inherited in C, | |
298 | and if the basetypes of D are E anf F, | |
299 | then this vector contains binfos for inheritance of E and F by C. | |
300 | ||
301 | Has values of: | |
302 | ||
303 | TREE_VECs | |
304 | ||
305 | ||
306 | @item BINFO_INHERITANCE_CHAIN | |
307 | Temporarily used to represent specific inheritances. It usually points | |
308 | to the binfo associated with the lesser derived type, but it can be | |
309 | reversed by reverse_path. For example: | |
310 | ||
311 | @example | |
312 | Z ZbY least derived | |
313 | | | |
314 | Y YbX | |
315 | | | |
316 | X Xb most derived | |
317 | ||
318 | TYPE_BINFO (X) == Xb | |
319 | BINFO_INHERITANCE_CHAIN (Xb) == YbX | |
320 | BINFO_INHERITANCE_CHAIN (Yb) == ZbY | |
321 | BINFO_INHERITANCE_CHAIN (Zb) == 0 | |
322 | @end example | |
323 | ||
324 | Not sure is the above is really true, get_base_distance has is point | |
325 | towards the most derived type, opposite from above. | |
326 | ||
327 | Set by build_vbase_path, recursive_bounded_basetype_p, | |
328 | get_base_distance, lookup_field, lookup_fnfields, and reverse_path. | |
329 | ||
330 | What things can this be used on: | |
331 | ||
332 | TREE_VECs that are binfos | |
333 | ||
334 | ||
335 | @item BINFO_OFFSET | |
336 | The offset where this basetype appears in its containing type. | |
337 | BINFO_OFFSET slot holds the offset (in bytes) from the base of the | |
338 | complete object to the base of the part of the object that is allocated | |
339 | on behalf of this `type'. This is always 0 except when there is | |
340 | multiple inheritance. | |
341 | ||
342 | Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example. | |
343 | ||
344 | ||
345 | @item BINFO_VIRTUALS | |
346 | A unique list of functions for the virtual function table. See also | |
347 | TYPE_BINFO_VIRTUALS. | |
348 | ||
349 | What things can this be used on: | |
350 | ||
351 | TREE_VECs that are binfos | |
352 | ||
353 | ||
354 | @item BINFO_VTABLE | |
355 | Used to find the VAR_DECL that is the virtual function table associated | |
356 | with this binfo. See also TYPE_BINFO_VTABLE. To get the virtual | |
357 | function table pointer, see CLASSTYPE_VFIELD. | |
358 | ||
359 | What things can this be used on: | |
360 | ||
361 | TREE_VECs that are binfos | |
362 | ||
363 | Has values of: | |
364 | ||
365 | VAR_DECLs that are virtual function tables | |
366 | ||
367 | ||
368 | @item BLOCK_SUPERCONTEXT | |
369 | In the outermost scope of each function, it points to the FUNCTION_DECL | |
370 | node. It aids in better DWARF support of inline functions. | |
371 | ||
372 | ||
373 | @item CLASSTYPE_TAGS | |
374 | CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a | |
375 | class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans | |
376 | these and calls pushtag on them.) | |
377 | ||
378 | finish_struct scans these to produce TYPE_DECLs to add to the | |
379 | TYPE_FIELDS of the type. | |
380 | ||
381 | It is expected that name found in the TREE_PURPOSE slot is unique, | |
382 | resolve_scope_to_name is one such place that depends upon this | |
383 | uniqueness. | |
384 | ||
385 | ||
386 | @item CLASSTYPE_METHOD_VEC | |
387 | The following is true after finish_struct has been called (on the | |
388 | class?) but not before. Before finish_struct is called, things are | |
389 | different to some extent. Contains a TREE_VEC of methods of the class. | |
390 | The TREE_VEC_LENGTH is the number of differently named methods plus one | |
391 | for the 0th entry. The 0th entry is always allocated, and reserved for | |
392 | ctors and dtors. If there are none, TREE_VEC_ELT(N,0) == NULL_TREE. | |
393 | Each entry of the TREE_VEC is a FUNCTION_DECL. For each FUNCTION_DECL, | |
394 | there is a DECL_CHAIN slot. If the FUNCTION_DECL is the last one with a | |
395 | given name, the DECL_CHAIN slot is NULL_TREE. Otherwise it is the next | |
396 | method that has the same name (but a different signature). It would | |
397 | seem that it is not true that because the DECL_CHAIN slot is used in | |
398 | this way, we cannot call pushdecl to put the method in the global scope | |
399 | (cause that would overwrite the TREE_CHAIN slot), because they use | |
400 | different _CHAINs. finish_struct_methods setups up one version of the | |
401 | TREE_CHAIN slots on the FUNCTION_DECLs. | |
402 | ||
403 | friends are kept in TREE_LISTs, so that there's no need to use their | |
404 | TREE_CHAIN slot for anything. | |
405 | ||
406 | Has values of: | |
407 | ||
408 | TREE_VECs | |
409 | ||
410 | ||
411 | @item CLASSTYPE_VFIELD | |
412 | Seems to be in the process of being renamed TYPE_VFIELD. Use on types | |
413 | to get the main virtual function table pointer. To get the virtual | |
414 | function table use BINFO_VTABLE (TYPE_BINFO ()). | |
415 | ||
416 | Has values of: | |
417 | ||
418 | FIELD_DECLs that are virtual function table pointers | |
419 | ||
420 | What things can this be used on: | |
421 | ||
422 | RECORD_TYPEs | |
423 | ||
424 | ||
425 | @item DECL_CLASS_CONTEXT | |
426 | Identifies the context that the _DECL was found in. For virtual function | |
427 | tables, it points to the type associated with the virtual function | |
428 | table. See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT. | |
429 | ||
430 | The difference between this and DECL_CONTEXT, is that for virtuals | |
431 | functions like: | |
432 | ||
433 | @example | |
434 | struct A | |
435 | @{ | |
436 | virtual int f (); | |
437 | @}; | |
438 | ||
439 | struct B : A | |
440 | @{ | |
441 | int f (); | |
442 | @}; | |
443 | ||
444 | DECL_CONTEXT (A::f) == A | |
445 | DECL_CLASS_CONTEXT (A::f) == A | |
446 | ||
447 | DECL_CONTEXT (B::f) == A | |
448 | DECL_CLASS_CONTEXT (B::f) == B | |
449 | @end example | |
450 | ||
451 | Has values of: | |
452 | ||
453 | RECORD_TYPEs, or UNION_TYPEs | |
454 | ||
455 | What things can this be used on: | |
456 | ||
457 | TYPE_DECLs, _DECLs | |
458 | ||
459 | ||
460 | @item DECL_CONTEXT | |
461 | Identifies the context that the _DECL was found in. Can be used on | |
462 | virtual function tables to find the type associated with the virtual | |
463 | function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a | |
464 | better access method. Internally the same as DECL_FIELD_CONTEXT, so | |
465 | don't us both. See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and | |
466 | DECL_CLASS_CONTEXT. | |
467 | ||
468 | Has values of: | |
469 | ||
470 | RECORD_TYPEs | |
471 | ||
472 | ||
473 | What things can this be used on: | |
474 | ||
475 | @display | |
476 | VAR_DECLs that are virtual function tables | |
477 | _DECLs | |
478 | @end display | |
479 | ||
480 | ||
481 | @item DECL_FIELD_CONTEXT | |
482 | Identifies the context that the FIELD_DECL was found in. Internally the | |
483 | same as DECL_CONTEXT, so don't us both. See also DECL_CONTEXT, | |
484 | DECL_FCONTEXT and DECL_CLASS_CONTEXT. | |
485 | ||
486 | Has values of: | |
487 | ||
488 | RECORD_TYPEs | |
489 | ||
490 | What things can this be used on: | |
491 | ||
492 | @display | |
493 | FIELD_DECLs that are virtual function pointers | |
494 | FIELD_DECLs | |
495 | @end display | |
496 | ||
497 | ||
8d08fdba MS |
498 | @item DECL_NAME |
499 | ||
500 | Has values of: | |
501 | ||
502 | @display | |
503 | 0 for things that don't have names | |
504 | IDENTIFIER_NODEs for TYPE_DECLs | |
505 | @end display | |
506 | ||
507 | @item DECL_IGNORED_P | |
508 | A bit that can be set to inform the debug information output routines in | |
8d2733ca | 509 | the back-end that a certain _DECL node should be totally ignored. |
8d08fdba MS |
510 | |
511 | Used in cases where it is known that the debugging information will be | |
512 | output in another file, or where a sub-type is known not to be needed | |
513 | because the enclosing type is not needed. | |
514 | ||
515 | A compiler constructed virtual destructor in derived classes that do not | |
67cc5fec | 516 | define an explicit destructor that was defined explicit in a base class |
8d08fdba MS |
517 | has this bit set as well. Also used on __FUNCTION__ and |
518 | __PRETTY_FUNCTION__ to mark they are ``compiler generated.'' c-decl and | |
519 | c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,'' | |
520 | and ``user-invisible variable.'' | |
521 | ||
522 | Functions built by the C++ front-end such as default destructors, | |
67cc5fec | 523 | virtual destructors and default constructors want to be marked that |
8d08fdba MS |
524 | they are compiler generated, but unsure why. |
525 | ||
526 | Currently, it is used in an absolute way in the C++ front-end, as an | |
527 | optimization, to tell the debug information output routines to not | |
528 | generate debugging information that will be output by another separately | |
529 | compiled file. | |
530 | ||
531 | ||
532 | @item DECL_VIRTUAL_P | |
533 | A flag used on FIELD_DECLs and VAR_DECLs. (Documentation in tree.h is | |
534 | wrong.) Used in VAR_DECLs to indicate that the variable is a vtable. | |
535 | It is also used in FIELD_DECLs for vtable pointers. | |
536 | ||
537 | What things can this be used on: | |
538 | ||
539 | FIELD_DECLs and VAR_DECLs | |
540 | ||
541 | ||
542 | @item DECL_VPARENT | |
543 | Used to point to the parent type of the vtable if there is one, else it | |
544 | is just the type associated with the vtable. Because of the sharing of | |
545 | virtual function tables that goes on, this slot is not very useful, and | |
546 | is in fact, not used in the compiler at all. It can be removed. | |
547 | ||
548 | What things can this be used on: | |
549 | ||
550 | VAR_DECLs that are virtual function tables | |
551 | ||
552 | Has values of: | |
553 | ||
554 | RECORD_TYPEs maybe UNION_TYPEs | |
555 | ||
556 | ||
557 | @item DECL_FCONTEXT | |
558 | Used to find the first baseclass in which this FIELD_DECL is defined. | |
559 | See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT. | |
560 | ||
561 | How it is used: | |
562 | ||
563 | Used when writing out debugging information about vfield and | |
564 | vbase decls. | |
565 | ||
566 | What things can this be used on: | |
567 | ||
568 | FIELD_DECLs that are virtual function pointers | |
569 | FIELD_DECLs | |
570 | ||
571 | ||
572 | @item DECL_REFERENCE_SLOT | |
573 | Used to hold the initialize for the reference. | |
574 | ||
575 | What things can this be used on: | |
576 | ||
577 | PARM_DECLs and VAR_DECLs that have a reference type | |
578 | ||
579 | ||
580 | @item DECL_VINDEX | |
581 | Used for FUNCTION_DECLs in two different ways. Before the structure | |
582 | containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a | |
583 | FUNCTION_DECL in a base class which is the FUNCTION_DECL which this | |
584 | FUNCTION_DECL will replace as a virtual function. When the class is | |
585 | laid out, this pointer is changed to an INTEGER_CST node which is | |
586 | suitable to find an index into the virtual function table. See | |
587 | get_vtable_entry as to how one can find the right index into the virtual | |
588 | function table. The first index 0, of a virtual function table it not | |
589 | used in the normal way, so the first real index is 1. | |
590 | ||
591 | DECL_VINDEX may be a TREE_LIST, that would seem to be a list of | |
592 | overridden FUNCTION_DECLs. add_virtual_function has code to deal with | |
593 | this when it uses the variable base_fndecl_list, but it would seem that | |
594 | somehow, it is possible for the TREE_LIST to pursist until method_call, | |
595 | and it should not. | |
596 | ||
597 | ||
598 | What things can this be used on: | |
599 | ||
600 | FUNCTION_DECLs | |
601 | ||
602 | ||
603 | @item DECL_SOURCE_FILE | |
604 | Identifies what source file a particular declaration was found in. | |
605 | ||
606 | Has values of: | |
607 | ||
608 | "<built-in>" on TYPE_DECLs to mean the typedef is built in | |
609 | ||
610 | ||
611 | @item DECL_SOURCE_LINE | |
612 | Identifies what source line number in the source file the declaration | |
613 | was found at. | |
614 | ||
615 | Has values of: | |
616 | ||
617 | @display | |
618 | 0 for an undefined label | |
619 | ||
620 | 0 for TYPE_DECLs that are internally generated | |
621 | ||
622 | 0 for FUNCTION_DECLs for functions generated by the compiler | |
623 | (not yet, but should be) | |
624 | ||
625 | 0 for ``magic'' arguments to functions, that the user has no | |
626 | control over | |
627 | @end display | |
628 | ||
629 | ||
630 | @item TREE_USED | |
631 | ||
632 | Has values of: | |
633 | ||
634 | 0 for unused labels | |
635 | ||
636 | ||
637 | @item TREE_ADDRESSABLE | |
638 | A flag that is set for any type that has a constructor. | |
639 | ||
640 | ||
641 | @item TREE_COMPLEXITY | |
642 | They seem a kludge way to track recursion, poping, and pushing. They only | |
643 | appear in cp-decl.c and cp-decl2.c, so the are a good candidate for | |
644 | proper fixing, and removal. | |
645 | ||
646 | ||
4dabb379 MS |
647 | @item TREE_HAS_CONSTRUCTOR |
648 | A flag to indicate when a CALL_EXPR represents a call to a constructor. | |
649 | If set, we know that the type of the object, is the complete type of the | |
650 | object, and that the value returned is nonnull. When used in this | |
651 | fashion, it is an optimization. Can also be used on SAVE_EXPRs to | |
652 | indicate when they are of fixed type and nonnull. Can also be used on | |
653 | INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor. | |
654 | ||
655 | ||
8d08fdba MS |
656 | @item TREE_PRIVATE |
657 | Set for FIELD_DECLs by finish_struct. But not uniformly set. | |
658 | ||
659 | The following routines do something with PRIVATE access: | |
660 | build_method_call, alter_access, finish_struct_methods, | |
661 | finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType, | |
662 | CWriteUseObject, compute_access, lookup_field, dfs_pushdecl, | |
663 | GNU_xref_member, dbxout_type_fields, dbxout_type_method_1 | |
664 | ||
665 | ||
666 | @item TREE_PROTECTED | |
667 | The following routines do something with PROTECTED access: | |
668 | build_method_call, alter_access, finish_struct, convert_to_aggr, | |
669 | CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject, | |
670 | compute_access, lookup_field, GNU_xref_member, dbxout_type_fields, | |
671 | dbxout_type_method_1 | |
672 | ||
673 | ||
674 | @item TYPE_BINFO | |
675 | Used to get the binfo for the type. | |
676 | ||
677 | Has values of: | |
678 | ||
679 | TREE_VECs that are binfos | |
680 | ||
681 | What things can this be used on: | |
682 | ||
683 | RECORD_TYPEs | |
684 | ||
685 | ||
686 | @item TYPE_BINFO_BASETYPES | |
687 | See also BINFO_BASETYPES. | |
688 | ||
689 | @item TYPE_BINFO_VIRTUALS | |
690 | A unique list of functions for the virtual function table. See also | |
691 | BINFO_VIRTUALS. | |
692 | ||
693 | What things can this be used on: | |
694 | ||
695 | RECORD_TYPEs | |
696 | ||
697 | ||
698 | @item TYPE_BINFO_VTABLE | |
699 | Points to the virtual function table associated with the given type. | |
700 | See also BINFO_VTABLE. | |
701 | ||
702 | What things can this be used on: | |
703 | ||
704 | RECORD_TYPEs | |
705 | ||
706 | Has values of: | |
707 | ||
708 | VAR_DECLs that are virtual function tables | |
709 | ||
710 | ||
711 | @item TYPE_NAME | |
712 | Names the type. | |
713 | ||
714 | Has values of: | |
715 | ||
716 | @display | |
717 | 0 for things that don't have names. | |
718 | should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and | |
719 | ENUM_TYPEs. | |
720 | TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but | |
721 | shouldn't be. | |
722 | TYPE_DECL for typedefs, unsure why. | |
723 | @end display | |
724 | ||
725 | What things can one use this on: | |
726 | ||
727 | @display | |
728 | TYPE_DECLs | |
729 | RECORD_TYPEs | |
730 | UNION_TYPEs | |
731 | ENUM_TYPEs | |
732 | @end display | |
733 | ||
734 | History: | |
735 | ||
736 | It currently points to the TYPE_DECL for RECORD_TYPEs, | |
737 | UNION_TYPEs and ENUM_TYPEs, but it should be history soon. | |
738 | ||
739 | ||
740 | @item TYPE_METHODS | |
741 | Synonym for @code{CLASSTYPE_METHOD_VEC}. Chained together with | |
742 | @code{TREE_CHAIN}. @file{dbxout.c} uses this to get at the methods of a | |
743 | class. | |
744 | ||
745 | ||
746 | @item TYPE_DECL | |
747 | Used to represent typedefs, and used to represent bindings layers. | |
748 | ||
749 | Components: | |
750 | ||
751 | DECL_NAME is the name of the typedef. For example, foo would | |
752 | be found in the DECL_NAME slot when @code{typedef int foo;} is | |
753 | seen. | |
754 | ||
755 | DECL_SOURCE_LINE identifies what source line number in the | |
756 | source file the declaration was found at. A value of 0 | |
757 | indicates that this TYPE_DECL is just an internal binding layer | |
67cc5fec | 758 | marker, and does not correspond to a user supplied typedef. |
8d08fdba MS |
759 | |
760 | DECL_SOURCE_FILE | |
761 | ||
762 | @item TYPE_FIELDS | |
763 | A linked list (via @code{TREE_CHAIN}) of member types of a class. The | |
764 | list can contain @code{TYPE_DECL}s, but there can also be other things | |
765 | in the list apparently. See also @code{CLASSTYPE_TAGS}. | |
766 | ||
767 | ||
768 | @item TYPE_VIRTUAL_P | |
769 | A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is | |
770 | a virtual function table or a pointer to one. When used on a | |
771 | @code{FUNCTION_DECL}, indicates that it is a virtual function. When | |
772 | used on an @code{IDENTIFIER_NODE}, indicates that a function with this | |
773 | same name exists and has been declared virtual. | |
774 | ||
775 | When used on types, it indicates that the type has virtual functions, or | |
776 | is derived from one that does. | |
777 | ||
778 | Not sure if the above about virtual function tables is still true. See | |
779 | also info on @code{DECL_VIRTUAL_P}. | |
780 | ||
781 | What things can this be used on: | |
782 | ||
783 | FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs | |
784 | ||
785 | ||
786 | @item VF_BASETYPE_VALUE | |
787 | Get the associated type from the binfo that caused the given vfield to | |
788 | exist. This is the least derived class (the most parent class) that | |
789 | needed a virtual function table. It is probably the case that all uses | |
790 | of this field are misguided, but they need to be examined on a | |
791 | case-by-case basis. See history for more information on why the | |
792 | previous statement was made. | |
793 | ||
794 | Set at @code{finish_base_struct} time. | |
795 | ||
796 | What things can this be used on: | |
797 | ||
798 | TREE_LISTs that are vfields | |
799 | ||
800 | History: | |
801 | ||
802 | This field was used to determine if a virtual function table's | |
803 | slot should be filled in with a certain virtual function, by | |
804 | checking to see if the type returned by VF_BASETYPE_VALUE was a | |
805 | parent of the context in which the old virtual function existed. | |
806 | This incorrectly assumes that a given type _could_ not appear as | |
807 | a parent twice in a given inheritance lattice. For single | |
808 | inheritance, this would in fact work, because a type could not | |
809 | possibly appear more than once in an inheritance lattice, but | |
810 | with multiple inheritance, a type can appear more than once. | |
811 | ||
812 | ||
813 | @item VF_BINFO_VALUE | |
814 | Identifies the binfo that caused this vfield to exist. If this vfield | |
815 | is from the first direct base class that has a virtual function table, | |
816 | then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the | |
817 | direct base where the vfield came from. Can use @code{TREE_VIA_VIRTUAL} | |
818 | on result to find out if it is a virtual base class. Related to the | |
819 | binfo found by | |
820 | ||
821 | @example | |
822 | get_binfo (VF_BASETYPE_VALUE (vfield), t, 0) | |
823 | @end example | |
824 | ||
825 | @noindent | |
826 | where @samp{t} is the type that has the given vfield. | |
827 | ||
828 | @example | |
829 | get_binfo (VF_BASETYPE_VALUE (vfield), t, 0) | |
830 | @end example | |
831 | ||
832 | @noindent | |
38e01259 | 833 | will return the binfo for the given vfield. |
8d08fdba MS |
834 | |
835 | May or may not be set at @code{modify_vtable_entries} time. Set at | |
836 | @code{finish_base_struct} time. | |
837 | ||
838 | What things can this be used on: | |
839 | ||
840 | TREE_LISTs that are vfields | |
841 | ||
842 | ||
843 | @item VF_DERIVED_VALUE | |
844 | Identifies the type of the most derived class of the vfield, excluding | |
38e01259 | 845 | the class this vfield is for. |
8d08fdba MS |
846 | |
847 | Set at @code{finish_base_struct} time. | |
848 | ||
849 | What things can this be used on: | |
850 | ||
851 | TREE_LISTs that are vfields | |
852 | ||
853 | ||
854 | @item VF_NORMAL_VALUE | |
855 | Identifies the type of the most derived class of the vfield, including | |
856 | the class this vfield is for. | |
857 | ||
858 | Set at @code{finish_base_struct} time. | |
859 | ||
860 | What things can this be used on: | |
861 | ||
862 | TREE_LISTs that are vfields | |
863 | ||
864 | ||
865 | @item WRITABLE_VTABLES | |
866 | This is a option that can be defined when building the compiler, that | |
867 | will cause the compiler to output vtables into the data segment so that | |
868 | the vtables maybe written. This is undefined by default, because | |
869 | normally the vtables should be unwritable. People that implement object | |
870 | I/O facilities may, or people that want to change the dynamic type of | |
871 | objects may want to have the vtables writable. Another way of achieving | |
872 | this would be to make a copy of the vtable into writable memory, but the | |
873 | drawback there is that that method only changes the type for one object. | |
874 | ||
875 | @end table | |
876 | ||
877 | @node Typical Behavior, Coding Conventions, Macros, Top | |
878 | @section Typical Behavior | |
879 | ||
880 | @cindex parse errors | |
881 | ||
882 | Whenever seemingly normal code fails with errors like | |
883 | @code{syntax error at `\@{'}, it's highly likely that grokdeclarator is | |
884 | returning a NULL_TREE for whatever reason. | |
885 | ||
886 | @node Coding Conventions, Templates, Typical Behavior, Top | |
887 | @section Coding Conventions | |
888 | ||
889 | It should never be that case that trees are modified in-place by the | |
890 | back-end, @emph{unless} it is guaranteed that the semantics are the same | |
891 | no matter how shared the tree structure is. @file{fold-const.c} still | |
892 | has some cases where this is not true, but rms hypothesizes that this | |
893 | will never be a problem. | |
894 | ||
895 | @node Templates, Access Control, Coding Conventions, Top | |
896 | @section Templates | |
897 | ||
f30432d7 MS |
898 | A template is represented by a @code{TEMPLATE_DECL}. The specific |
899 | fields used are: | |
8d08fdba | 900 | |
f30432d7 MS |
901 | @table @code |
902 | @item DECL_TEMPLATE_RESULT | |
903 | The generic decl on which instantiations are based. This looks just | |
904 | like any other decl. | |
8d08fdba | 905 | |
f30432d7 MS |
906 | @item DECL_TEMPLATE_PARMS |
907 | The parameters to this template. | |
908 | @end table | |
8d08fdba | 909 | |
f30432d7 MS |
910 | The generic decl is parsed as much like any other decl as possible, |
911 | given the parameterization. The template decl is not built up until the | |
912 | generic decl has been completed. For template classes, a template decl | |
913 | is generated for each member function and static data member, as well. | |
8d08fdba | 914 | |
f30432d7 MS |
915 | Template members of template classes are represented by a TEMPLATE_DECL |
916 | for the class' parameters around another TEMPLATE_DECL for the member's | |
917 | parameters. | |
918 | ||
919 | All declarations that are instantiations or specializations of templates | |
920 | refer to their template and parameters through DECL_TEMPLATE_INFO. | |
921 | ||
922 | How should I handle parsing member functions with the proper param | |
923 | decls? Set them up again or try to use the same ones? Currently we do | |
924 | the former. We can probably do this without any extra machinery in | |
925 | store_pending_inline, by deducing the parameters from the decl in | |
926 | do_pending_inlines. PRE_PARSED_TEMPLATE_DECL? | |
927 | ||
928 | If a base is a parm, we can't check anything about it. If a base is not | |
929 | a parm, we need to check it for name binding. Do finish_base_struct if | |
930 | no bases are parameterized (only if none, including indirect, are | |
931 | parms). Nah, don't bother trying to do any of this until instantiation | |
932 | -- we only need to do name binding in advance. | |
933 | ||
934 | Always set up method vec and fields, inc. synthesized methods. Really? | |
935 | We can't know the types of the copy folks, or whether we need a | |
936 | destructor, or can have a default ctor, until we know our bases and | |
937 | fields. Otherwise, we can assume and fix ourselves later. Hopefully. | |
8d08fdba MS |
938 | |
939 | @node Access Control, Error Reporting, Templates, Top | |
940 | @section Access Control | |
941 | The function compute_access returns one of three values: | |
942 | ||
943 | @table @code | |
944 | @item access_public | |
945 | means that the field can be accessed by the current lexical scope. | |
946 | ||
947 | @item access_protected | |
948 | means that the field cannot be accessed by the current lexical scope | |
949 | because it is protected. | |
950 | ||
951 | @item access_private | |
952 | means that the field cannot be accessed by the current lexical scope | |
953 | because it is private. | |
954 | @end table | |
955 | ||
956 | DECL_ACCESS is used for access declarations; alter_access creates a list | |
957 | of types and accesses for a given decl. | |
958 | ||
959 | Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return | |
960 | codes of compute_access and were used as a cache for compute_access. | |
961 | Now they are not used at all. | |
962 | ||
963 | TREE_PROTECTED and TREE_PRIVATE are used to record the access levels | |
964 | granted by the containing class. BEWARE: TREE_PUBLIC means something | |
965 | completely unrelated to access control! | |
966 | ||
51c184be | 967 | @node Error Reporting, Parser, Access Control, Top |
8d08fdba MS |
968 | @section Error Reporting |
969 | ||
8d2733ca | 970 | The C++ front-end uses a call-back mechanism to allow functions to print |
8d08fdba MS |
971 | out reasonable strings for types and functions without putting extra |
972 | logic in the functions where errors are found. The interface is through | |
973 | the @code{cp_error} function (or @code{cp_warning}, etc.). The | |
974 | syntax is exactly like that of @code{error}, except that a few more | |
975 | conversions are supported: | |
976 | ||
977 | @itemize @bullet | |
978 | @item | |
979 | %C indicates a value of `enum tree_code'. | |
980 | @item | |
981 | %D indicates a *_DECL node. | |
982 | @item | |
983 | %E indicates a *_EXPR node. | |
984 | @item | |
985 | %L indicates a value of `enum languages'. | |
986 | @item | |
987 | %P indicates the name of a parameter (i.e. "this", "1", "2", ...) | |
988 | @item | |
989 | %T indicates a *_TYPE node. | |
990 | @item | |
991 | %O indicates the name of an operator (MODIFY_EXPR -> "operator ="). | |
992 | ||
993 | @end itemize | |
994 | ||
995 | There is some overlap between these; for instance, any of the node | |
996 | options can be used for printing an identifier (though only @code{%D} | |
997 | tries to decipher function names). | |
998 | ||
999 | For a more verbose message (@code{class foo} as opposed to just @code{foo}, | |
1000 | including the return type for functions), use @code{%#c}. | |
1001 | To have the line number on the error message indicate the line of the | |
1002 | DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want, | |
1003 | use @code{%+D}, or it will default to the first. | |
1004 | ||
51c184be MS |
1005 | @node Parser, Copying Objects, Error Reporting, Top |
1006 | @section Parser | |
1007 | ||
1008 | Some comments on the parser: | |
1009 | ||
1010 | The @code{after_type_declarator} / @code{notype_declarator} hack is | |
1011 | necessary in order to allow redeclarations of @code{TYPENAME}s, for | |
1012 | instance | |
1013 | ||
1014 | @example | |
1015 | typedef int foo; | |
1016 | class A @{ | |
1017 | char *foo; | |
1018 | @}; | |
1019 | @end example | |
1020 | ||
1021 | In the above, the first @code{foo} is parsed as a @code{notype_declarator}, | |
1022 | and the second as a @code{after_type_declarator}. | |
1023 | ||
1024 | Ambiguities: | |
1025 | ||
1026 | There are currently four reduce/reduce ambiguities in the parser. They are: | |
1027 | ||
1028 | 1) Between @code{template_parm} and | |
1029 | @code{named_class_head_sans_basetype}, for the tokens @code{aggr | |
1030 | identifier}. This situation occurs in code looking like | |
1031 | ||
1032 | @example | |
1033 | template <class T> class A @{ @}; | |
1034 | @end example | |
1035 | ||
1036 | It is ambiguous whether @code{class T} should be parsed as the | |
1037 | declaration of a template type parameter named @code{T} or an unnamed | |
1038 | constant parameter of type @code{class T}. Section 14.6, paragraph 3 of | |
1039 | the January '94 working paper states that the first interpretation is | |
a28e3c7f | 1040 | the correct one. This ambiguity results in two reduce/reduce conflicts. |
51c184be | 1041 | |
a28e3c7f | 1042 | 2) Between @code{primary} and @code{type_id} for code like @samp{int()} |
51c184be MS |
1043 | in places where both can be accepted, such as the argument to |
1044 | @code{sizeof}. Section 8.1 of the pre-San Diego working paper specifies | |
1045 | that these ambiguous constructs will be interpreted as @code{typename}s. | |
a28e3c7f MS |
1046 | This ambiguity results in six reduce/reduce conflicts between |
1047 | @samp{absdcl} and @samp{functional_cast}. | |
51c184be | 1048 | |
a28e3c7f MS |
1049 | 3) Between @code{functional_cast} and |
1050 | @code{complex_direct_notype_declarator}, for various token strings. | |
1051 | This situation occurs in code looking like | |
51c184be MS |
1052 | |
1053 | @example | |
1054 | int (*a); | |
1055 | @end example | |
1056 | ||
1057 | This code is ambiguous; it could be a declaration of the variable | |
1058 | @samp{a} as a pointer to @samp{int}, or it could be a functional cast of | |
1059 | @samp{*a} to @samp{int}. Section 6.8 specifies that the former | |
a28e3c7f MS |
1060 | interpretation is correct. This ambiguity results in 7 reduce/reduce |
1061 | conflicts. Another aspect of this ambiguity is code like 'int (x[2]);', | |
1062 | which is resolved at the '[' and accounts for 6 reduce/reduce conflicts | |
1063 | between @samp{direct_notype_declarator} and | |
1064 | @samp{primary}/@samp{overqualified_id}. Finally, there are 4 r/r | |
1065 | conflicts between @samp{expr_or_declarator} and @samp{primary} over code | |
1066 | like 'int (a);', which could probably be resolved but would also | |
1067 | probably be more trouble than it's worth. In all, this situation | |
1068 | accounts for 17 conflicts. Ack! | |
1069 | ||
1070 | The second case above is responsible for the failure to parse 'LinppFile | |
1071 | ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave | |
1072 | Math.h++) as an object declaration, and must be fixed so that it does | |
1073 | not resolve until later. | |
1074 | ||
1075 | 4) Indirectly between @code{after_type_declarator} and @code{parm}, for | |
1076 | type names. This occurs in (as one example) code like | |
51c184be MS |
1077 | |
1078 | @example | |
1079 | typedef int foo, bar; | |
1080 | class A @{ | |
1081 | foo (bar); | |
1082 | @}; | |
1083 | @end example | |
1084 | ||
1085 | What is @code{bar} inside the class definition? We currently interpret | |
1086 | it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an | |
a28e3c7f | 1087 | @code{after_type_declarator}. I believe that xlC is correct, in light |
51c184be MS |
1088 | of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that |
1089 | could possibly be a type name is taken as the @i{decl-specifier-seq} of | |
1090 | a @i{declaration}." However, it seems clear that this rule must be | |
a28e3c7f MS |
1091 | violated in the case of constructors. This ambiguity accounts for 8 |
1092 | conflicts. | |
51c184be MS |
1093 | |
1094 | Unlike the others, this ambiguity is not recognized by the Working Paper. | |
1095 | ||
8d2733ca | 1096 | @node Copying Objects, Exception Handling, Parser, Top |
51c184be MS |
1097 | @section Copying Objects |
1098 | ||
1099 | The generated copy assignment operator in g++ does not currently do the | |
1100 | right thing for multiple inheritance involving virtual bases; it just | |
1101 | calls the copy assignment operators for its direct bases. What it | |
1102 | should probably do is: | |
1103 | ||
1104 | 1) Split up the copy assignment operator for all classes that have | |
1105 | vbases into "copy my vbases" and "copy everything else" parts. Or do | |
1106 | the trickiness that the constructors do to ensure that vbases don't get | |
1107 | initialized by intermediate bases. | |
1108 | ||
1109 | 2) Wander through the class lattice, find all vbases for which no | |
1110 | intermediate base has a user-defined copy assignment operator, and call | |
1111 | their "copy everything else" routines. If not all of my vbases satisfy | |
1112 | this criterion, warn, because this may be surprising behavior. | |
1113 | ||
1114 | 3) Call the "copy everything else" routine for my direct bases. | |
1115 | ||
1116 | If we only have one direct base, we can just foist everything off onto | |
1117 | them. | |
1118 | ||
1119 | This issue is currently under discussion in the core reflector | |
1120 | (2/28/94). | |
1121 | ||
f0e01782 | 1122 | @node Exception Handling, Free Store, Copying Objects, Top |
8d2733ca MS |
1123 | @section Exception Handling |
1124 | ||
a3b49ccd MS |
1125 | Note, exception handling in g++ is still under development. |
1126 | ||
8d2733ca MS |
1127 | This section describes the mapping of C++ exceptions in the C++ |
1128 | front-end, into the back-end exception handling framework. | |
1129 | ||
1130 | The basic mechanism of exception handling in the back-end is | |
1131 | unwind-protect a la elisp. This is a general, robust, and language | |
1132 | independent representation for exceptions. | |
1133 | ||
1134 | The C++ front-end exceptions are mapping into the unwind-protect | |
1135 | semantics by the C++ front-end. The mapping is describe below. | |
1136 | ||
e8abc66f MS |
1137 | When -frtti is used, rtti is used to do exception object type checking, |
1138 | when it isn't used, the encoded name for the type of the object being | |
1139 | thrown is used instead. All code that originates exceptions, even code | |
1140 | that throws exceptions as a side effect, like dynamic casting, and all | |
1141 | code that catches exceptions must be compiled with either -frtti, or | |
1142 | -fno-rtti. It is not possible to mix rtti base exception handling | |
5156628f MS |
1143 | objects with code that doesn't use rtti. The exceptions to this, are |
1144 | code that doesn't catch or throw exceptions, catch (...), and code that | |
1145 | just rethrows an exception. | |
e8abc66f MS |
1146 | |
1147 | Currently we use the normal mangling used in building functions names | |
1148 | (int's are "i", const char * is PCc) to build the non-rtti base type | |
1149 | descriptors for exception handling. These descriptors are just plain | |
1150 | NULL terminated strings, and internally they are passed around as char | |
1151 | *. | |
8d2733ca MS |
1152 | |
1153 | In C++, all cleanups should be protected by exception regions. The | |
1154 | region starts just after the reason why the cleanup is created has | |
1155 | ended. For example, with an automatic variable, that has a constructor, | |
1156 | it would be right after the constructor is run. The region ends just | |
1157 | before the finalization is expanded. Since the backend may expand the | |
1158 | cleanup multiple times along different paths, once for normal end of the | |
1159 | region, once for non-local gotos, once for returns, etc, the backend | |
1160 | must take special care to protect the finalization expansion, if the | |
1161 | expansion is for any other reason than normal region end, and it is | |
1162 | `inline' (it is inside the exception region). The backend can either | |
1163 | choose to move them out of line, or it can created an exception region | |
1164 | over the finalization to protect it, and in the handler associated with | |
1165 | it, it would not run the finalization as it otherwise would have, but | |
1166 | rather just rethrow to the outer handler, careful to skip the normal | |
1167 | handler for the original region. | |
1168 | ||
1169 | In Ada, they will use the more runtime intensive approach of having | |
1170 | fewer regions, but at the cost of additional work at run time, to keep a | |
1171 | list of things that need cleanups. When a variable has finished | |
1172 | construction, they add the cleanup to the list, when the come to the end | |
1173 | of the lifetime of the variable, the run the list down. If the take a | |
1174 | hit before the section finishes normally, they examine the list for | |
1175 | actions to perform. I hope they add this logic into the back-end, as it | |
1176 | would be nice to get that alternative approach in C++. | |
1177 | ||
a3b49ccd MS |
1178 | On an rs6000, xlC stores exception objects on that stack, under the try |
1179 | block. When is unwinds down into a handler, the frame pointer is | |
1180 | adjusted back to the normal value for the frame in which the handler | |
1181 | resides, and the stack pointer is left unchanged from the time at which | |
db5ae43f | 1182 | the object was thrown. This is so that there is always someplace for |
a3b49ccd MS |
1183 | the exception object, and nothing can overwrite it, once we start |
1184 | throwing. The only bad part, is that the stack remains large. | |
1185 | ||
f30432d7 MS |
1186 | The below points out some things that work in g++'s exception handling. |
1187 | ||
1188 | All completely constructed temps and local variables are cleaned up in | |
1189 | all unwinded scopes. Completely constructed parts of partially | |
1190 | constructed objects are cleaned up. This includes partially built | |
be99da77 | 1191 | arrays. Exception specifications are now handled. Thrown objects are |
a50f0918 MS |
1192 | now cleaned up all the time. We can now tell if we have an active |
1193 | exception being thrown or not (__eh_type != 0). We use this to call | |
1194 | terminate if someone does a throw; without there being an active | |
0021b564 JM |
1195 | exception object. uncaught_exception () works. Exception handling |
1196 | should work right if you optimize. Exception handling should work with | |
1197 | -fpic or -fPIC. | |
f30432d7 | 1198 | |
6060a796 MS |
1199 | The below points out some flaws in g++'s exception handling, as it now |
1200 | stands. | |
1201 | ||
e8abc66f | 1202 | Only exact type matching or reference matching of throw types works when |
0021b564 JM |
1203 | -fno-rtti is used. Only works on a SPARC (like Suns) (both -mflat and |
1204 | -mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000, | |
1205 | PowerPC, Alpha, mips, VAX, m68k and z8k machines. SPARC v9 may not | |
1206 | work. HPPA is mostly done, but throwing between a shared library and | |
1207 | user code doesn't yet work. Some targets have support for data-driven | |
1208 | unwinding. Partial support is in for all other machines, but a stack | |
1209 | unwinder called __unwind_function has to be written, and added to | |
1210 | libgcc2 for them. The new EH code doesn't rely upon the | |
1211 | __unwind_function for C++ code, instead it creates per function | |
1212 | unwinders right inside the function, unfortunately, on many platforms | |
1213 | the definition of RETURN_ADDR_RTX in the tm.h file for the machine port | |
1214 | is wrong. See below for details on __unwind_function. RTL_EXPRs for EH | |
1215 | cond variables for && and || exprs should probably be wrapped in | |
1216 | UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved. | |
f30432d7 MS |
1217 | |
1218 | We only do pointer conversions on exception matching a la 15.3 p2 case | |
1219 | 3: `A handler with type T, const T, T&, or const T& is a match for a | |
1220 | throw-expression with an object of type E if [3]T is a pointer type and | |
1221 | E is a pointer type that can be converted to T by a standard pointer | |
1222 | conversion (_conv.ptr_) not involving conversions to pointers to private | |
1223 | or protected base classes.' when -frtti is given. | |
1224 | ||
1225 | We don't call delete on new expressions that die because the ctor threw | |
1226 | an exception. See except/18 for a test case. | |
1227 | ||
1228 | 15.2 para 13: The exception being handled should be rethrown if control | |
1229 | reaches the end of a handler of the function-try-block of a constructor | |
1230 | or destructor, right now, it is not. | |
1231 | ||
1232 | 15.2 para 12: If a return statement appears in a handler of | |
1233 | function-try-block of a constructor, the program is ill-formed, but this | |
1234 | isn't diagnosed. | |
1235 | ||
1236 | 15.2 para 11: If the handlers of a function-try-block contain a jump | |
1237 | into the body of a constructor or destructor, the program is ill-formed, | |
1238 | but this isn't diagnosed. | |
1239 | ||
1240 | 15.2 para 9: Check that the fully constructed base classes and members | |
1241 | of an object are destroyed before entering the handler of a | |
1242 | function-try-block of a constructor or destructor for that object. | |
1243 | ||
1244 | build_exception_variant should sort the incoming list, so that it | |
6060a796 MS |
1245 | implements set compares, not exact list equality. Type smashing should |
1246 | smash exception specifications using set union. | |
1247 | ||
be99da77 MS |
1248 | Thrown objects are usually allocated on the heap, in the usual way. If |
1249 | one runs out of heap space, throwing an object will probably never work. | |
1250 | This could be relaxed some by passing an __in_chrg parameter to track | |
1251 | who has control over the exception object. Thrown objects are not | |
1252 | allocated on the heap when they are pointer to object types. We should | |
1253 | extend it so that all small (<4*sizeof(void*)) objects are stored | |
1254 | directly, instead of allocated on the heap. | |
8ccc31eb MS |
1255 | |
1256 | When the backend returns a value, it can create new exception regions | |
1257 | that need protecting. The new region should rethrow the object in | |
1258 | context of the last associated cleanup that ran to completion. | |
a3b49ccd | 1259 | |
f30432d7 MS |
1260 | The structure of the code that is generated for C++ exception handling |
1261 | code is shown below: | |
1262 | ||
1263 | @example | |
1264 | Ln: throw value; | |
1265 | copy value onto heap | |
1266 | jump throw (Ln, id, address of copy of value on heap) | |
1267 | ||
cffa8729 | 1268 | try @{ |
f30432d7 MS |
1269 | +Lstart: the start of the main EH region |
1270 | |... ... | |
1271 | +Lend: the end of the main EH region | |
cffa8729 | 1272 | @} catch (T o) @{ |
f30432d7 | 1273 | ...1 |
cffa8729 | 1274 | @} |
f30432d7 MS |
1275 | Lresume: |
1276 | nop used to make sure there is something before | |
1277 | the next region ends, if there is one | |
1278 | ... ... | |
1279 | ||
1280 | jump Ldone | |
1281 | [ | |
1282 | Lmainhandler: handler for the region Lstart-Lend | |
1283 | cleanup | |
1284 | ] zero or more, depending upon automatic vars with dtors | |
1285 | +Lpartial: | |
1286 | | jump Lover | |
1287 | +Lhere: | |
1288 | rethrow (Lhere, same id, same obj); | |
1289 | Lterm: handler for the region Lpartial-Lhere | |
1290 | call terminate | |
1291 | Lover: | |
1292 | [ | |
1293 | [ | |
1294 | call throw_type_match | |
cffa8729 | 1295 | if (eq) @{ |
f30432d7 MS |
1296 | ] these lines disappear when there is no catch condition |
1297 | +Lsregion2: | |
1298 | | ...1 | |
1299 | | jump Lresume | |
1300 | |Lhandler: handler for the region Lsregion2-Leregion2 | |
1301 | | rethrow (Lresume, same id, same obj); | |
1302 | +Leregion2 | |
cffa8729 | 1303 | @} |
f30432d7 MS |
1304 | ] there are zero or more of these sections, depending upon how many |
1305 | catch clauses there are | |
1306 | ----------------------------- expand_end_all_catch -------------------------- | |
1307 | here we have fallen off the end of all catch | |
1308 | clauses, so we rethrow to outer | |
1309 | rethrow (Lresume, same id, same obj); | |
1310 | ----------------------------- expand_end_all_catch -------------------------- | |
1311 | [ | |
1312 | L1: maybe throw routine | |
1313 | ] depending upon if we have expanded it or not | |
1314 | Ldone: | |
1315 | ret | |
1316 | ||
1317 | start_all_catch emits labels: Lresume, | |
1318 | ||
cffa8729 | 1319 | @end example |
f30432d7 | 1320 | |
e8abc66f MS |
1321 | The __unwind_function takes a pointer to the throw handler, and is |
1322 | expected to pop the stack frame that was built to call it, as well as | |
f30432d7 MS |
1323 | the frame underneath and then jump to the throw handler. It must |
1324 | restore all registers to their proper values as well as all other | |
1325 | machine state as determined by the context in which we are unwinding | |
1326 | into. The way I normally start is to compile: | |
1327 | ||
1328 | void *g; | |
cffa8729 | 1329 | foo(void* a) @{ g = a; @} |
f30432d7 MS |
1330 | |
1331 | with -S, and change the thing that alters the PC (return, or ret | |
1332 | usually) to not alter the PC, making sure to leave all other semantics | |
1333 | (like adjusting the stack pointer, or frame pointers) in. After that, | |
1334 | replicate the prologue once more at the end, again, changing the PC | |
1335 | altering instructions, and finally, at the very end, jump to `g'. | |
1336 | ||
1337 | It takes about a week to write this routine, if someone wants to | |
1338 | volunteer to write this routine for any architecture, exception support | |
1339 | for that architecture will be added to g++. Please send in those code | |
1340 | donations. One other thing that needs to be done, is to double check | |
1341 | that __builtin_return_address (0) works. | |
1342 | ||
1343 | @subsection Specific Targets | |
e8abc66f | 1344 | |
f30432d7 MS |
1345 | For the alpha, the __unwind_function will be something resembling: |
1346 | ||
1347 | @example | |
1348 | void | |
1349 | __unwind_function(void *ptr) | |
1350 | @{ | |
1351 | /* First frame */ | |
1352 | asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */ | |
1353 | asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ | |
1354 | ||
1355 | /* Second frame */ | |
1356 | asm ("ldq $15, 8($30)"); /* fp */ | |
1357 | asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ | |
1358 | ||
1359 | /* Return */ | |
1360 | asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */ | |
1361 | @} | |
1362 | @end example | |
1363 | ||
1364 | @noindent | |
1365 | However, there are a few problems preventing it from working. First of | |
1366 | all, the gcc-internal function @code{__builtin_return_address} needs to | |
1367 | work given an argument of 0 for the alpha. As it stands as of August | |
1368 | 30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c} | |
1369 | will definitely not work on the alpha. Instead, we need to define | |
1370 | the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe), | |
1371 | @code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new | |
1372 | definition for @code{RETURN_ADDR_RTX}. | |
1373 | ||
1374 | In addition (and more importantly), we need a way to reliably find the | |
1375 | frame pointer on the alpha. The use of the value 8 above to restore the | |
1376 | frame pointer (register 15) is incorrect. On many systems, the frame | |
1377 | pointer is consistently offset to a specific point on the stack. On the | |
1378 | alpha, however, the frame pointer is pushed last. First the return | |
1379 | address is stored, then any other registers are saved (e.g., @code{s0}), | |
1380 | and finally the frame pointer is put in place. So @code{fp} could have | |
1381 | an offset of 8, but if the calling function saved any registers at all, | |
1382 | they add to the offset. | |
1383 | ||
1384 | The only places the frame size is noted are with the @samp{.frame} | |
1385 | directive, for use by the debugger and the OSF exception handling model | |
1386 | (useless to us), and in the initial computation of the new value for | |
1387 | @code{sp}, the stack pointer. For example, the function may start with: | |
1388 | ||
1389 | @example | |
1390 | lda $30,-32($30) | |
1391 | .frame $15,32,$26,0 | |
1392 | @end example | |
1393 | ||
1394 | @noindent | |
1395 | The 32 above is exactly the value we need. With this, we can be sure | |
1396 | that the frame pointer is stored 8 bytes less---in this case, at 24(sp)). | |
1397 | The drawback is that there is no way that I (Brendan) have found to let | |
1398 | us discover the size of a previous frame @emph{inside} the definition | |
1399 | of @code{__unwind_function}. | |
1400 | ||
1401 | So to accomplish exception handling support on the alpha, we need two | |
1402 | things: first, a way to figure out where the frame pointer was stored, | |
1403 | and second, a functional @code{__builtin_return_address} implementation | |
1404 | for except.c to be able to use it. | |
1405 | ||
0021b564 JM |
1406 | Or just support DWARF 2 unwind info. |
1407 | ||
1408 | @subsection New Backend Exception Support | |
1409 | ||
1410 | This subsection discusses various aspects of the design of the | |
1411 | data-driven model being implemented for the exception handling backend. | |
1412 | ||
1413 | The goal is to generate enough data during the compilation of user code, | |
1414 | such that we can dynamically unwind through functions at run time with a | |
1415 | single routine (@code{__throw}) that lives in libgcc.a, built by the | |
1416 | compiler, and dispatch into associated exception handlers. | |
1417 | ||
1418 | This information is generated by the DWARF 2 debugging backend, and | |
1419 | includes all of the information __throw needs to unwind an arbitrary | |
1420 | frame. It specifies where all of the saved registers and the return | |
1421 | address can be found at any point in the function. | |
1422 | ||
1423 | Major disadvantages when enabling exceptions are: | |
1424 | ||
1425 | @itemize @bullet | |
1426 | @item | |
1427 | Code that uses caller saved registers, can't, when flow can be | |
956d6950 | 1428 | transferred into that code from an exception handler. In high performance |
0021b564 JM |
1429 | code this should not usually be true, so the effects should be minimal. |
1430 | ||
1431 | @end itemize | |
1432 | ||
f30432d7 | 1433 | @subsection Backend Exception Support |
e8abc66f MS |
1434 | |
1435 | The backend must be extended to fully support exceptions. Right now | |
1436 | there are a few hooks into the alpha exception handling backend that | |
1437 | resides in the C++ frontend from that backend that allows exception | |
1438 | handling to work in g++. An exception region is a segment of generated | |
1439 | code that has a handler associated with it. The exception regions are | |
1440 | denoted in the generated code as address ranges denoted by a starting PC | |
1441 | value and an ending PC value of the region. Some of the limitations | |
1442 | with this scheme are: | |
1443 | ||
1444 | @itemize @bullet | |
1445 | @item | |
1446 | The backend replicates insns for such things as loop unrolling and | |
1447 | function inlining. Right now, there are no hooks into the frontend's | |
1448 | exception handling backend to handle the replication of insns. When | |
1449 | replication happens, a new exception region descriptor needs to be | |
1450 | generated for the new region. | |
1451 | ||
1452 | @item | |
1453 | The backend expects to be able to rearrange code, for things like jump | |
1454 | optimization. Any rearranging of the code needs have exception region | |
1455 | descriptors updated appropriately. | |
1456 | ||
1457 | @item | |
1458 | The backend can eliminate dead code. Any associated exception region | |
1459 | descriptor that refers to fully contained code that has been eliminated | |
1460 | should also be removed, although not doing this is harmless in terms of | |
1461 | semantics. | |
1462 | ||
cffa8729 | 1463 | @end itemize |
e8abc66f MS |
1464 | |
1465 | The above is not meant to be exhaustive, but does include all things I | |
1466 | have thought of so far. I am sure other limitations exist. | |
1467 | ||
f30432d7 MS |
1468 | Below are some notes on the migration of the exception handling code |
1469 | backend from the C++ frontend to the backend. | |
1470 | ||
1471 | NOTEs are to be used to denote the start of an exception region, and the | |
1472 | end of the region. I presume that the interface used to generate these | |
1473 | notes in the backend would be two functions, start_exception_region and | |
1474 | end_exception_region (or something like that). The frontends are | |
1475 | required to call them in pairs. When marking the end of a region, an | |
1476 | argument can be passed to indicate the handler for the marked region. | |
1477 | This can be passed in many ways, currently a tree is used. Another | |
1478 | possibility would be insns for the handler, or a label that denotes a | |
38e01259 | 1479 | handler. I have a feeling insns might be the best way to pass it. |
f30432d7 | 1480 | Semantics are, if an exception is thrown inside the region, control is |
956d6950 | 1481 | transferred unconditionally to the handler. If control passes through |
f30432d7 MS |
1482 | the handler, then the backend is to rethrow the exception, in the |
1483 | context of the end of the original region. The handler is protected by | |
1484 | the conventional mechanisms; it is the frontend's responsibility to | |
1485 | protect the handler, if special semantics are required. | |
1486 | ||
1487 | This is a very low level view, and it would be nice is the backend | |
1488 | supported a somewhat higher level view in addition to this view. This | |
1489 | higher level could include source line number, name of the source file, | |
1490 | name of the language that threw the exception and possibly the name of | |
1491 | the exception. Kenner may want to rope you into doing more than just | |
1492 | the basics required by C++. You will have to resolve this. He may want | |
1493 | you to do support for non-local gotos, first scan for exception handler, | |
1494 | if none is found, allow the debugger to be entered, without any cleanups | |
1495 | being done. To do this, the backend would have to know the difference | |
1496 | between a cleanup-rethrower, and a real handler, if would also have to | |
1497 | have a way to know if a handler `matches' a thrown exception, and this | |
1498 | is frontend specific. | |
1499 | ||
f30432d7 MS |
1500 | The stack unwinder is one of the hardest parts to do. It is highly |
1501 | machine dependent. The form that kenner seems to like was a couple of | |
1502 | macros, that would do the machine dependent grunt work. One preexisting | |
1503 | function that might be of some use is __builtin_return_address (). One | |
1504 | macro he seemed to want was __builtin_return_address, and the other | |
1505 | would do the hard work of fixing up the registers, adjusting the stack | |
1506 | pointer, frame pointer, arg pointer and so on. | |
1507 | ||
f30432d7 | 1508 | |
42976354 | 1509 | @node Free Store, Mangling, Exception Handling, Top |
f0e01782 MS |
1510 | @section Free Store |
1511 | ||
e9f32eb5 MS |
1512 | @code{operator new []} adds a magic cookie to the beginning of arrays |
1513 | for which the number of elements will be needed by @code{operator delete | |
1514 | []}. These are arrays of objects with destructors and arrays of objects | |
1515 | that define @code{operator delete []} with the optional size_t argument. | |
1516 | This cookie can be examined from a program as follows: | |
f0e01782 MS |
1517 | |
1518 | @example | |
1519 | typedef unsigned long size_t; | |
1520 | extern "C" int printf (const char *, ...); | |
1521 | ||
1522 | size_t nelts (void *p) | |
1523 | @{ | |
1524 | struct cookie @{ | |
1525 | size_t nelts __attribute__ ((aligned (sizeof (double)))); | |
1526 | @}; | |
1527 | ||
1528 | cookie *cp = (cookie *)p; | |
1529 | --cp; | |
1530 | ||
1531 | return cp->nelts; | |
1532 | @} | |
1533 | ||
1534 | struct A @{ | |
1535 | ~A() @{ @} | |
1536 | @}; | |
1537 | ||
1538 | main() | |
1539 | @{ | |
1540 | A *ap = new A[3]; | |
1541 | printf ("%ld\n", nelts (ap)); | |
1542 | @} | |
1543 | @end example | |
1544 | ||
a5894242 MS |
1545 | @section Linkage |
1546 | The linkage code in g++ is horribly twisted in order to meet two design goals: | |
1547 | ||
1548 | 1) Avoid unnecessary emission of inlines and vtables. | |
1549 | ||
1550 | 2) Support pedantic assemblers like the one in AIX. | |
1551 | ||
1552 | To meet the first goal, we defer emission of inlines and vtables until | |
1553 | the end of the translation unit, where we can decide whether or not they | |
1554 | are needed, and how to emit them if they are. | |
42976354 BK |
1555 | |
1556 | @node Mangling, Concept Index, Free Store, Top | |
1557 | @section Function name mangling for C++ and Java | |
1558 | ||
1559 | Both C++ and Jave provide overloaded function and methods, | |
1560 | which are methods with the same types but different parameter lists. | |
1561 | Selecting the correct version is done at compile time. | |
1562 | Though the overloaded functions have the same name in the source code, | |
1563 | they need to be translated into different assembler-level names, | |
1564 | since typical assemblers and linkers cannot handle overloading. | |
1565 | This process of encoding the parameter types with the method name | |
1566 | into a unique name is called @dfn{name mangling}. The inverse | |
1567 | process is called @dfn{demangling}. | |
1568 | ||
1569 | It is convenient that C++ and Java use compatible mangling schemes, | |
1570 | since the makes life easier for tools such as gdb, and it eases | |
1571 | integration between C++ and Java. | |
1572 | ||
1573 | Note there is also a standard "Jave Native Interface" (JNI) which | |
1574 | implements a different calling convention, and uses a different | |
1575 | mangling scheme. The JNI is a rather abstract ABI so Java can call methods | |
1576 | written in C or C++; | |
1577 | we are concerned here about a lower-level interface primarily | |
1578 | intended for methods written in Java, but that can also be used for C++ | |
1579 | (and less easily C). | |
1580 | ||
5427d758 MT |
1581 | Note that on systems that follow BSD tradition, a C identifier @code{var} |
1582 | would get "mangled" into the assembler name @samp{_var}. On such | |
1583 | systems, all other mangled names are also prefixed by a @samp{_} | |
1584 | which is not shown in the following examples. | |
1585 | ||
42976354 BK |
1586 | @subsection Method name mangling |
1587 | ||
1588 | C++ mangles a method by emitting the function name, followed by @code{__}, | |
1589 | followed by encodings of any method qualifiers (such as @code{const}), | |
1590 | followed by the mangling of the method's class, | |
1591 | followed by the mangling of the parameters, in order. | |
1592 | ||
1593 | For example @code{Foo::bar(int, long) const} is mangled | |
1594 | as @samp{bar__C3Fooil}. | |
1595 | ||
1596 | For a constructor, the method name is left out. | |
1597 | That is @code{Foo::Foo(int, long) const} is mangled | |
1598 | as @samp{__C3Fooil}. | |
1599 | ||
1600 | GNU Java does the same. | |
1601 | ||
1602 | @subsection Primitive types | |
1603 | ||
1604 | The C++ types @code{int}, @code{long}, @code{short}, @code{char}, | |
1605 | and @code{long long} are mangled as @samp{i}, @samp{l}, | |
1606 | @samp{s}, @samp{c}, and @samp{x}, respectively. | |
1607 | The corresponding unsigned types have @samp{U} prefixed | |
1608 | to the mangling. The type @code{signed char} is mangled @samp{Sc}. | |
1609 | ||
1610 | The C++ and Java floating-point types @code{float} and @code{double} | |
1611 | are mangled as @samp{f} and @samp{d} respectively. | |
1612 | ||
1613 | The C++ @code{bool} type and the Java @code{boolean} type are | |
1614 | mangled as @samp{b}. | |
1615 | ||
1616 | The C++ @code{wchar_t} and the Java @code{char} types are | |
1617 | mangled as @samp{w}. | |
1618 | ||
1619 | The Java integral types @code{byte}, @code{short}, @code{int} | |
1620 | and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i}, | |
1621 | and @samp{x}, respectively. | |
1622 | ||
1623 | C++ code that has included @code{javatypes.h} will mangle | |
1624 | the typedefs @code{jbyte}, @code{jshort}, @code{jint} | |
1625 | and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i}, | |
1626 | and @samp{x}. (This has not been implemented yet.) | |
1627 | ||
1628 | @subsection Mangling of simple names | |
1629 | ||
1630 | A simple class, package, template, or namespace name is | |
1631 | encoded as the number of characters in the name, followed by | |
1632 | the actual characters. Thus the class @code{Foo} | |
1633 | is encoded as @samp{3Foo}. | |
1634 | ||
1635 | If any of the characters in the name are not alphanumeric | |
1636 | (i.e not one of the standard ASCII letters, digits, or '_'), | |
1637 | or the initial character is a digit, then the name is | |
1638 | mangled as a sequence of encoded Unicode letters. | |
1639 | A Unicode encoding starts with a @samp{U} to indicate | |
1640 | that Unicode escapes are used, followed by the number of | |
1641 | bytes used by the Unicode encoding, followed by the bytes | |
1642 | representing the encoding. ASSCI letters and | |
1643 | non-initial digits are encoded without change. However, all | |
1644 | other characters (including underscore and initial digits) are | |
1645 | translated into a sequence starting with an underscore, | |
1646 | followed by the big-endian 4-hex-digit lower-case encoding of the character. | |
1647 | ||
1648 | If a method name contains Unicode-escaped characters, the | |
1649 | entire mangled method name is followed by a @samp{U}. | |
1650 | ||
1651 | For example, the method @code{X\u0319::M\u002B(int)} is encoded as | |
1652 | @samp{M_002b__U6X_0319iU}. | |
1653 | ||
5427d758 | 1654 | |
42976354 BK |
1655 | @subsection Pointer and reference types |
1656 | ||
1657 | A C++ pointer type is mangled as @samp{P} followed by the | |
1658 | mangling of the type pointed to. | |
1659 | ||
1660 | A C++ reference type as mangled as @samp{R} followed by the | |
1661 | mangling of the type referenced. | |
1662 | ||
1663 | A Java object reference type is equivalent | |
1664 | to a C++ pointer parameter, so we mangle such an parameter type | |
1665 | as @samp{P} followed by the mangling of the class name. | |
1666 | ||
61fbdb55 AM |
1667 | @subsection Squangled type compression |
1668 | ||
1669 | Squangling (enabled with the @samp{-fsquangle} option), utilizes | |
1670 | the @samp{B} code to indicate reuse of a previously | |
1671 | seen type within an indentifier. Types are recognized in a left to | |
1672 | right manner and given increasing values, which are | |
1673 | appended to the code in the standard manner. Ie, multiple digit numbers | |
1674 | are delimited by @samp{_} characters. A type is considered to be any | |
1675 | non primitive type, regardless of whether its a parameter, template | |
1676 | parameter, or entire template. Certain codes are considered modifiers | |
1677 | of a type, and are not included as part of the type. These are the | |
1678 | @samp{C}, @samp{V}, @samp{P}, @samp{A}, @samp{R}, and @samp{U} codes, | |
1679 | denoting constant, volatile, pointer, array, reference, and unsigned. | |
1680 | These codes may precede a @samp{B} type in order to make the required | |
1681 | modifications to the type. | |
1682 | ||
1683 | For example: | |
1684 | @example | |
1685 | template <class T> class class1 @{ @}; | |
1686 | ||
1687 | template <class T> class class2 @{ @}; | |
1688 | ||
1689 | class class3 @{ @}; | |
1690 | ||
1691 | int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @} | |
1692 | ||
1693 | B0 -> class2<class1<class3> | |
1694 | B1 -> class1<class3> | |
1695 | B2 -> class3 | |
1696 | @end example | |
1697 | Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}. | |
1698 | The int parameter is a basic type, and does not receive a B encoding... | |
1699 | ||
42976354 BK |
1700 | @subsection Qualified names |
1701 | ||
1702 | Both C++ and Java allow a class to be lexically nested inside another | |
1703 | class. C++ also supports namespaces (not yet implemented by G++). | |
1704 | Java also supports packages. | |
1705 | ||
1706 | These are all mangled the same way: First the letter @samp{Q} | |
1707 | indicates that we are emitting a qualified name. | |
1708 | That is followed by the number of parts in the qualified name. | |
1709 | If that number is 9 or less, it is emitted with no delimiters. | |
1710 | Otherwise, an underscore is written before and after the count. | |
1711 | Then follows each part of the qualified name, as described above. | |
1712 | ||
1713 | For example @code{Foo::\u0319::Bar} is encoded as | |
1714 | @samp{Q33FooU5_03193Bar}. | |
1715 | ||
61fbdb55 AM |
1716 | Squangling utilizes the the letter @samp{K} to indicate a |
1717 | remembered portion of a qualified name. As qualified names are processed | |
1718 | for an identifier, the names are numbered and remembered in a | |
1719 | manner similar to the @samp{B} type compression code. | |
1720 | Names are recognized left to right, and given increasing values, which are | |
1721 | appended to the code in the standard manner. ie, multiple digit numbers | |
1722 | are delimited by @samp{_} characters. | |
1723 | ||
1724 | For example | |
1725 | @example | |
1726 | class Andrew | |
1727 | @{ | |
1728 | class WasHere | |
1729 | @{ | |
1730 | class AndHereToo | |
1731 | @{ | |
1732 | @}; | |
1733 | @}; | |
1734 | @}; | |
1735 | ||
1736 | f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @} | |
1737 | ||
1738 | K0 -> Andrew | |
1739 | K1 -> Andrew::WasHere | |
1740 | K2 -> Andrew::WasHere::AndHereToo | |
1741 | @end example | |
1742 | Function @samp{f()} would be mangled as : | |
1743 | @samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo} | |
1744 | ||
1745 | There are some occasions when either a @samp{B} or @samp{K} code could | |
1746 | be chosen, preference is always given to the @samp{B} code. Ie, the example | |
1747 | in the section on @samp{B} mangling could have used a @samp{K} code | |
1748 | instead of @samp{B2}. | |
1749 | ||
42976354 BK |
1750 | @subsection Templates |
1751 | ||
386b8a85 | 1752 | A class template instantiation is encoded as the letter @samp{t}, |
42976354 BK |
1753 | followed by the encoding of the template name, followed |
1754 | the number of template parameters, followed by encoding of the template | |
1755 | parameters. If a template parameter is a type, it is written | |
1756 | as a @samp{Z} followed by the encoding of the type. | |
1757 | ||
386b8a85 JM |
1758 | A function template specialization (either an instantiation or an |
1759 | explicit specialization) is encoded by an @samp{H} followed by the | |
f84b4be9 JM |
1760 | encoding of the template parameters, as described above, followed by an |
1761 | @samp{_}, the encoding of the argument types to the template function | |
1762 | (not the specialization), another @samp{_}, and the return type. (Like | |
1763 | the argument types, the return type is the return type of the function | |
386b8a85 JM |
1764 | template, not the specialization.) Template parameters in the argument |
1765 | and return types are encoded by an @samp{X} for type parameters, or a | |
f84b4be9 JM |
1766 | @samp{Y} for constant parameters, an index indicating their position |
1767 | in the template parameter list declaration, and their template depth. | |
386b8a85 | 1768 | |
42976354 BK |
1769 | @subsection Arrays |
1770 | ||
1771 | C++ array types are mangled by emitting @samp{A}, followed by | |
1772 | the length of the array, followed by an @samp{_}, followed by | |
1773 | the mangling of the element type. Of course, normally | |
1774 | array parameter types decay into a pointer types, so you | |
1775 | don't see this. | |
1776 | ||
1777 | Java arrays are objects. A Java type @code{T[]} is mangled | |
1778 | as if it were the C++ type @code{JArray<T>}. | |
1779 | For example @code{java.lang.String[]} is encoded as | |
1780 | @samp{Pt6JArray1ZPQ34java4lang6String}. | |
1781 | ||
5427d758 MT |
1782 | @subsection Static fields |
1783 | ||
1784 | Both C++ and Java classes can have static fields. | |
1785 | These are allocated statically, and are shared among all instances. | |
1786 | ||
1787 | The mangling starts with a prefix (@samp{_} in most systems), which is | |
1788 | followed by the mangling | |
1789 | of the class name, followed by the "joiner" and finally the field name. | |
1790 | The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special | |
1791 | separator character. For historical reasons (and idiosyncracies | |
1792 | of assembler syntax) it can @samp{$} or @samp{.} (or even | |
1793 | @samp{_} on a few systems). If the joiner is @samp{_} then the prefix | |
1794 | is @samp{__static_} instead of just @samp{_}. | |
1795 | ||
1796 | For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax) | |
1797 | would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var} | |
1798 | (or rarely @samp{__static_Q23Foo3Bar_var}). | |
1799 | ||
1800 | If the name of a static variable needs Unicode escapes, | |
1801 | the Unicode indicator @samp{U} comes before the "joiner". | |
1802 | This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}. | |
1803 | ||
42976354 BK |
1804 | @subsection Table of demangling code characters |
1805 | ||
1806 | The following special characters are used in mangling: | |
1807 | ||
1808 | @table @samp | |
1809 | @item A | |
1810 | Indicates a C++ array type. | |
1811 | ||
1812 | @item b | |
1813 | Encodes the C++ @code{bool} type, | |
1814 | and the Java @code{boolean} type. | |
1815 | ||
ff29fd00 | 1816 | @item B |
61fbdb55 | 1817 | Used for squangling. Similar in concept to the 'T' non-squangled code. |
ff29fd00 | 1818 | |
42976354 BK |
1819 | @item c |
1820 | Encodes the C++ @code{char} type, and the Java @code{byte} type. | |
1821 | ||
1822 | @item C | |
1823 | A modifier to indicate a @code{const} type. | |
1824 | Also used to indicate a @code{const} member function | |
1825 | (in which cases it precedes the encoding of the method's class). | |
1826 | ||
1827 | @item d | |
1828 | Encodes the C++ and Java @code{double} types. | |
1829 | ||
1830 | @item e | |
1831 | Indicates extra unknown arguments @code{...}. | |
1832 | ||
ff29fd00 MM |
1833 | @item E |
1834 | Indicates the opening parenthesis of an expression. | |
1835 | ||
42976354 BK |
1836 | @item f |
1837 | Encodes the C++ and Java @code{float} types. | |
1838 | ||
1839 | @item F | |
1840 | Used to indicate a function type. | |
1841 | ||
386b8a85 JM |
1842 | @item H |
1843 | Used to indicate a template function. | |
1844 | ||
42976354 BK |
1845 | @item i |
1846 | Encodes the C++ and Java @code{int} types. | |
1847 | ||
1848 | @item J | |
1849 | Indicates a complex type. | |
1850 | ||
ff29fd00 | 1851 | @item K |
61fbdb55 | 1852 | Used by squangling to compress qualified names. |
ff29fd00 | 1853 | |
42976354 BK |
1854 | @item l |
1855 | Encodes the C++ @code{long} type. | |
1856 | ||
1857 | @item P | |
1858 | Indicates a pointer type. Followed by the type pointed to. | |
1859 | ||
1860 | @item Q | |
1861 | Used to mangle qualified names, which arise from nested classes. | |
1862 | Should also be used for namespaces (?). | |
1863 | In Java used to mangle package-qualified names, and inner classes. | |
1864 | ||
1865 | @item r | |
1866 | Encodes the GNU C++ @code{long double} type. | |
1867 | ||
1868 | @item R | |
1869 | Indicates a reference type. Followed by the referenced type. | |
1870 | ||
1871 | @item s | |
1872 | Encodes the C++ and java @code{short} types. | |
1873 | ||
1874 | @item S | |
1875 | A modifier that indicates that the following integer type is signed. | |
1876 | Only used with @code{char}. | |
1877 | ||
1878 | Also used as a modifier to indicate a static member function. | |
1879 | ||
1880 | @item t | |
1881 | Indicates a template instantiation. | |
1882 | ||
1883 | @item T | |
1884 | A back reference to a previously seen type. | |
1885 | ||
1886 | @item U | |
1887 | A modifier that indicates that the following integer type is unsigned. | |
1888 | Also used to indicate that the following class or namespace name | |
1889 | is encoded using Unicode-mangling. | |
1890 | ||
1891 | @item v | |
1892 | Encodes the C++ and Java @code{void} types. | |
1893 | ||
1894 | @item V | |
1895 | A modified for a @code{const} type or method. | |
1896 | ||
1897 | @item w | |
1898 | Encodes the C++ @code{wchar_t} type, and the Java @code{char} types. | |
1899 | ||
ff29fd00 MM |
1900 | @item W |
1901 | Indicates the closing parenthesis of an expression. | |
1902 | ||
42976354 BK |
1903 | @item x |
1904 | Encodes the GNU C++ @code{long long} type, and the Java @code{long} type. | |
1905 | ||
386b8a85 JM |
1906 | @item X |
1907 | Encodes a template type parameter, when part of a function type. | |
1908 | ||
1909 | @item Y | |
1910 | Encodes a template constant parameter, when part of a function type. | |
1911 | ||
42976354 BK |
1912 | @item Z |
1913 | Used for template type parameters. | |
1914 | ||
1915 | @end table | |
1916 | ||
1917 | The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p} | |
1918 | also seem to be used for obscure purposes ... | |
1919 | ||
1920 | @node Concept Index, , Mangling, Top | |
a5894242 | 1921 | |
8d08fdba MS |
1922 | @section Concept Index |
1923 | ||
1924 | @printindex cp | |
1925 | ||
1926 | @bye |