]>
Commit | Line | Data |
---|---|---|
474e8e70 | 1 | @c markers: BUG TODO |
6de9cd9a | 2 | |
23a5b65a | 3 | @c Copyright (C) 1988-2014 Free Software Foundation, Inc. |
73a8ed7e JM |
4 | @c This is part of the GCC manual. |
5 | @c For copying conditions, see the file gcc.texi. | |
6 | ||
7 | @node Passes | |
8 | @chapter Passes and Files of the Compiler | |
9 | @cindex passes and files of the compiler | |
10 | @cindex files and passes of the compiler | |
11 | @cindex compiler passes and files | |
b1055be0 | 12 | @cindex pass dumps |
73a8ed7e | 13 | |
6de9cd9a | 14 | This chapter is dedicated to giving an overview of the optimization and |
f0eb93a8 | 15 | code generation passes of the compiler. In the process, it describes |
6de9cd9a DN |
16 | some of the language front end interface, though this description is no |
17 | where near complete. | |
18 | ||
19 | @menu | |
20 | * Parsing pass:: The language front end turns text into bits. | |
36536d79 | 21 | * Cilk Plus Transformation:: Transform Cilk Plus Code to equivalent C/C++. |
6de9cd9a | 22 | * Gimplification pass:: The bits are turned into something we can optimize. |
6ccde948 | 23 | * Pass manager:: Sequencing the optimization passes. |
94324dae | 24 | * Tree SSA passes:: Optimizations on a high-level representation. |
6de9cd9a | 25 | * RTL passes:: Optimizations on a low-level representation. |
b1055be0 | 26 | * Optimization info:: Dumping optimization information from passes. |
6de9cd9a DN |
27 | @end menu |
28 | ||
29 | @node Parsing pass | |
30 | @section Parsing pass | |
31 | @cindex GENERIC | |
32 | @findex lang_hooks.parse_file | |
33 | The language front end is invoked only once, via | |
34 | @code{lang_hooks.parse_file}, to parse the entire input. The language | |
f0eb93a8 | 35 | front end may use any intermediate language representation deemed |
474e8e70 | 36 | appropriate. The C front end uses GENERIC trees (@pxref{GENERIC}), plus |
6de9cd9a DN |
37 | a double handful of language specific tree codes defined in |
38 | @file{c-common.def}. The Fortran front end uses a completely different | |
39 | private representation. | |
40 | ||
41 | @cindex GIMPLE | |
42 | @cindex gimplification | |
43 | @cindex gimplifier | |
44 | @cindex language-independent intermediate representation | |
45 | @cindex intermediate representation lowering | |
46 | @cindex lowering, language-dependent intermediate representation | |
47 | At some point the front end must translate the representation used in the | |
f0eb93a8 | 48 | front end to a representation understood by the language-independent |
6de9cd9a | 49 | portions of the compiler. Current practice takes one of two forms. |
474e8e70 | 50 | The C front end manually invokes the gimplifier (@pxref{GIMPLE}) on each function, |
f0eb93a8 | 51 | and uses the gimplifier callbacks to convert the language-specific tree |
474e8e70 | 52 | nodes directly to GIMPLE before passing the function off to be compiled. |
6de9cd9a DN |
53 | The Fortran front end converts from a private representation to GENERIC, |
54 | which is later lowered to GIMPLE when the function is compiled. Which | |
55 | route to choose probably depends on how well GENERIC (plus extensions) | |
56 | can be made to match up with the source language and necessary parsing | |
57 | data structures. | |
58 | ||
59 | BUG: Gimplification must occur before nested function lowering, | |
60 | and nested function lowering must be done by the front end before | |
61 | passing the data off to cgraph. | |
62 | ||
63 | TODO: Cgraph should control nested function lowering. It would | |
64 | only be invoked when it is certain that the outer-most function | |
65 | is used. | |
66 | ||
67 | TODO: Cgraph needs a gimplify_function callback. It should be | |
68 | invoked when (1) it is certain that the function is used, (2) | |
69 | warning flags specified by the user require some amount of | |
70 | compilation in order to honor, (3) the language indicates that | |
71 | semantic analysis is not complete until gimplification occurs. | |
8a36672b | 72 | Hum@dots{} this sounds overly complicated. Perhaps we should just |
6de9cd9a DN |
73 | have the front end gimplify always; in most cases it's only one |
74 | function call. | |
75 | ||
f0eb93a8 | 76 | The front end needs to pass all function definitions and top level |
6de9cd9a DN |
77 | declarations off to the middle-end so that they can be compiled and |
78 | emitted to the object file. For a simple procedural language, it is | |
79 | usually most convenient to do this as each top level declaration or | |
80 | definition is seen. There is also a distinction to be made between | |
81 | generating functional code and generating complete debug information. | |
82 | The only thing that is absolutely required for functional code is that | |
27ef2cdd | 83 | function and data @emph{definitions} be passed to the middle-end. For |
f0eb93a8 | 84 | complete debug information, function, data and type declarations |
6de9cd9a DN |
85 | should all be passed as well. |
86 | ||
73a8ed7e | 87 | @findex rest_of_decl_compilation |
6de9cd9a DN |
88 | @findex rest_of_type_compilation |
89 | @findex cgraph_finalize_function | |
90 | In any case, the front end needs each complete top-level function or | |
f0eb93a8 | 91 | data declaration, and each data definition should be passed to |
6de9cd9a DN |
92 | @code{rest_of_decl_compilation}. Each complete type definition should |
93 | be passed to @code{rest_of_type_compilation}. Each function definition | |
94 | should be passed to @code{cgraph_finalize_function}. | |
95 | ||
96 | TODO: I know rest_of_compilation currently has all sorts of | |
94324dae EB |
97 | RTL generation semantics. I plan to move all code generation |
98 | bits (both Tree and RTL) to compile_function. Should we hide | |
6de9cd9a DN |
99 | cgraph from the front ends and move back to rest_of_compilation |
100 | as the official interface? Possibly we should rename all three | |
101 | interfaces such that the names match in some meaningful way and | |
102 | that is more descriptive than "rest_of". | |
103 | ||
f0eb93a8 | 104 | The middle-end will, at its option, emit the function and data |
6de9cd9a DN |
105 | definitions immediately or queue them for later processing. |
106 | ||
36536d79 BI |
107 | @node Cilk Plus Transformation |
108 | @section Cilk Plus Transformation | |
109 | @cindex CILK_PLUS | |
110 | ||
111 | If Cilk Plus generation (flag @option{-fcilkplus}) is enabled, all the Cilk | |
112 | Plus code is transformed into equivalent C and C++ functions. Majority of this | |
113 | transformation occurs toward the end of the parsing and right before the | |
114 | gimplification pass. | |
115 | ||
116 | These are the major components to the Cilk Plus language extension: | |
117 | @itemize @bullet | |
118 | @item Array Notations: | |
119 | During parsing phase, all the array notation specific information is stored in | |
120 | @code{ARRAY_NOTATION_REF} tree using the function | |
121 | @code{c_parser_array_notation}. During the end of parsing, we check the entire | |
122 | function to see if there are any array notation specific code (using the | |
123 | function @code{contains_array_notation_expr}). If this function returns | |
124 | true, then we expand them using either @code{expand_array_notation_exprs} or | |
125 | @code{build_array_notation_expr}. For the cases where array notations are | |
126 | inside conditions, they are transformed using the function | |
127 | @code{fix_conditional_array_notations}. The C language-specific routines are | |
128 | located in @file{c/c-array-notation.c} and the equivalent C++ routines are in | |
939b37da BI |
129 | the file @file{cp/cp-array-notation.c}. Common routines such as functions to |
130 | initialize built-in functions are stored in @file{array-notation-common.c}. | |
131 | ||
132 | @item Cilk keywords: | |
133 | @itemize @bullet | |
134 | @item @code{_Cilk_spawn}: | |
135 | The @code{_Cilk_spawn} keyword is parsed and the function it contains is marked | |
136 | as a spawning function. The spawning function is called the spawner. At | |
137 | the end of the parsing phase, appropriate built-in functions are | |
138 | added to the spawner that are defined in the Cilk runtime. The appropriate | |
139 | locations of these functions, and the internal structures are detailed in | |
140 | @code{cilk_init_builtins} in the file @file{cilk-common.c}. The pointers to | |
141 | Cilk functions and fields of internal structures are described | |
142 | in @file{cilk.h}. The built-in functions are described in | |
143 | @file{cilk-builtins.def}. | |
144 | ||
145 | During gimplification, a new "spawn-helper" function is created. | |
146 | The spawned function is replaced with a spawn helper function in the spawner. | |
147 | The spawned function-call is moved into the spawn helper. The main function | |
148 | that does these transformations is @code{gimplify_cilk_spawn} in | |
149 | @file{c-family/cilk.c}. In the spawn-helper, the gimplification function | |
150 | @code{gimplify_call_expr}, inserts a function call @code{__cilkrts_detach}. | |
151 | This function is expanded by @code{builtin_expand_cilk_detach} located in | |
152 | @file{c-family/cilk.c}. | |
153 | ||
154 | @item @code{_Cilk_sync}: | |
155 | @code{_Cilk_sync} is parsed like a keyword. During gimplification, | |
156 | the function @code{gimplify_cilk_sync} in @file{c-family/cilk.c}, will replace | |
157 | this keyword with a set of functions that are stored in the Cilk runtime. | |
158 | One of the internal functions inserted during gimplification, | |
159 | @code{__cilkrts_pop_frame} must be expanded by the compiler and is | |
160 | done by @code{builtin_expand_cilk_pop_frame} in @file{cilk-common.c}. | |
161 | ||
162 | @end itemize | |
36536d79 BI |
163 | @end itemize |
164 | ||
939b37da BI |
165 | Documentation about Cilk Plus and language specification is provided under the |
166 | "Learn" section in @w{@uref{http://www.cilkplus.org/}}. It is worth mentioning | |
167 | that the current implementation follows ABI 1.1. | |
36536d79 | 168 | |
6de9cd9a DN |
169 | @node Gimplification pass |
170 | @section Gimplification pass | |
171 | ||
172 | @cindex gimplification | |
173 | @cindex GIMPLE | |
174 | @dfn{Gimplification} is a whimsical term for the process of converting | |
175 | the intermediate representation of a function into the GIMPLE language | |
474e8e70 | 176 | (@pxref{GIMPLE}). The term stuck, and so words like ``gimplification'', |
d78aa55c | 177 | ``gimplify'', ``gimplifier'' and the like are sprinkled throughout this |
6de9cd9a DN |
178 | section of code. |
179 | ||
6de9cd9a | 180 | While a front end may certainly choose to generate GIMPLE directly if |
f0eb93a8 | 181 | it chooses, this can be a moderately complex process unless the |
6de9cd9a DN |
182 | intermediate language used by the front end is already fairly simple. |
183 | Usually it is easier to generate GENERIC trees plus extensions | |
184 | and let the language-independent gimplifier do most of the work. | |
185 | ||
186 | @findex gimplify_function_tree | |
187 | @findex gimplify_expr | |
188 | @findex lang_hooks.gimplify_expr | |
189 | The main entry point to this pass is @code{gimplify_function_tree} | |
f0eb93a8 | 190 | located in @file{gimplify.c}. From here we process the entire |
6de9cd9a DN |
191 | function gimplifying each statement in turn. The main workhorse |
192 | for this pass is @code{gimplify_expr}. Approximately everything | |
193 | passes through here at least once, and it is from here that we | |
194 | invoke the @code{lang_hooks.gimplify_expr} callback. | |
195 | ||
196 | The callback should examine the expression in question and return | |
197 | @code{GS_UNHANDLED} if the expression is not a language specific | |
198 | construct that requires attention. Otherwise it should alter the | |
199 | expression in some way to such that forward progress is made toward | |
8a36672b | 200 | producing valid GIMPLE@. If the callback is certain that the |
6de9cd9a DN |
201 | transformation is complete and the expression is valid GIMPLE, it |
202 | should return @code{GS_ALL_DONE}. Otherwise it should return | |
203 | @code{GS_OK}, which will cause the expression to be processed again. | |
204 | If the callback encounters an error during the transformation (because | |
205 | the front end is relying on the gimplification process to finish | |
206 | semantic checks), it should return @code{GS_ERROR}. | |
207 | ||
208 | @node Pass manager | |
209 | @section Pass manager | |
210 | ||
d9f235fc DJ |
211 | The pass manager is located in @file{passes.c}, @file{tree-optimize.c} |
212 | and @file{tree-pass.h}. | |
2769de23 | 213 | It processes passes as described in @file{passes.def}. |
6de9cd9a DN |
214 | Its job is to run all of the individual passes in the correct order, |
215 | and take care of standard bookkeeping that applies to every pass. | |
216 | ||
217 | The theory of operation is that each pass defines a structure that | |
78466c0e | 218 | represents everything we need to know about that pass---when it |
f0eb93a8 | 219 | should be run, how it should be run, what intermediate language |
6de9cd9a DN |
220 | form or on-the-side data structures it needs. We register the pass |
221 | to be run in some particular order, and the pass manager arranges | |
222 | for everything to happen in the correct order. | |
223 | ||
224 | The actuality doesn't completely live up to the theory at present. | |
225 | Command-line switches and @code{timevar_id_t} enumerations must still | |
226 | be defined elsewhere. The pass manager validates constraints but does | |
227 | not attempt to (re-)generate data structures or lower intermediate | |
228 | language form based on the requirements of the next pass. Nevertheless, | |
229 | what is present is useful, and a far sight better than nothing at all. | |
230 | ||
e0a42b0f | 231 | Each pass should have a unique name. |
8e352cd3 | 232 | Each pass may have its own dump file (for GCC debugging purposes). |
e0a42b0f ZC |
233 | Passes with a name starting with a star do not dump anything. |
234 | Sometimes passes are supposed to share a dump file / option name. | |
235 | To still give these unique names, you can use a prefix that is delimited | |
236 | by a space from the part that is used for the dump file / option name. | |
237 | E.g. When the pass name is "ud dce", the name used for dump file/options | |
238 | is "dce". | |
8e352cd3 | 239 | |
6de9cd9a DN |
240 | TODO: describe the global variables set up by the pass manager, |
241 | and a brief description of how a new pass should use it. | |
94324dae | 242 | I need to look at what info RTL passes use first@enddots{} |
6de9cd9a | 243 | |
94324dae EB |
244 | @node Tree SSA passes |
245 | @section Tree SSA passes | |
6de9cd9a | 246 | |
94324dae | 247 | The following briefly describes the Tree optimization passes that are |
6de9cd9a | 248 | run after gimplification and what source files they are located in. |
73a8ed7e JM |
249 | |
250 | @itemize @bullet | |
6de9cd9a DN |
251 | @item Remove useless statements |
252 | ||
253 | This pass is an extremely simple sweep across the gimple code in which | |
254 | we identify obviously dead code and remove it. Here we do things like | |
255 | simplify @code{if} statements with constant conditions, remove | |
256 | exception handling constructs surrounding code that obviously cannot | |
257 | throw, remove lexical bindings that contain no variables, and other | |
258 | assorted simplistic cleanups. The idea is to get rid of the obvious | |
259 | stuff quickly rather than wait until later when it's more work to get | |
260 | rid of it. This pass is located in @file{tree-cfg.c} and described by | |
261 | @code{pass_remove_useless_stmts}. | |
262 | ||
917f1b7e | 263 | @item OpenMP lowering |
48fa3029 AH |
264 | |
265 | If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers | |
266 | OpenMP constructs into GIMPLE. | |
267 | ||
268 | Lowering of OpenMP constructs involves creating replacement | |
269 | expressions for local variables that have been mapped using data | |
270 | sharing clauses, exposing the control flow of most synchronization | |
271 | directives and adding region markers to facilitate the creation of the | |
272 | control flow graph. The pass is located in @file{omp-low.c} and is | |
273 | described by @code{pass_lower_omp}. | |
274 | ||
275 | @item OpenMP expansion | |
276 | ||
277 | If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands | |
278 | parallel regions into their own functions to be invoked by the thread | |
279 | library. The pass is located in @file{omp-low.c} and is described by | |
280 | @code{pass_expand_omp}. | |
281 | ||
6de9cd9a DN |
282 | @item Lower control flow |
283 | ||
1eaf20ec | 284 | This pass flattens @code{if} statements (@code{COND_EXPR}) |
6de9cd9a DN |
285 | and moves lexical bindings (@code{BIND_EXPR}) out of line. After |
286 | this pass, all @code{if} statements will have exactly two @code{goto} | |
287 | statements in its @code{then} and @code{else} arms. Lexical binding | |
288 | information for each statement will be found in @code{TREE_BLOCK} rather | |
289 | than being inferred from its position under a @code{BIND_EXPR}. This | |
f0eb93a8 | 290 | pass is found in @file{gimple-low.c} and is described by |
6de9cd9a DN |
291 | @code{pass_lower_cf}. |
292 | ||
293 | @item Lower exception handling control flow | |
294 | ||
295 | This pass decomposes high-level exception handling constructs | |
296 | (@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form | |
297 | that explicitly represents the control flow involved. After this | |
298 | pass, @code{lookup_stmt_eh_region} will return a non-negative | |
299 | number for any statement that may have EH control flow semantics; | |
300 | examine @code{tree_can_throw_internal} or @code{tree_can_throw_external} | |
301 | for exact semantics. Exact control flow may be extracted from | |
302 | @code{foreach_reachable_handler}. The EH region nesting tree is defined | |
303 | in @file{except.h} and built in @file{except.c}. The lowering pass | |
304 | itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}. | |
305 | ||
306 | @item Build the control flow graph | |
307 | ||
308 | This pass decomposes a function into basic blocks and creates all of | |
309 | the edges that connect them. It is located in @file{tree-cfg.c} and | |
310 | is described by @code{pass_build_cfg}. | |
311 | ||
312 | @item Find all referenced variables | |
313 | ||
f0eb93a8 | 314 | This pass walks the entire function and collects an array of all |
6de9cd9a DN |
315 | variables referenced in the function, @code{referenced_vars}. The |
316 | index at which a variable is found in the array is used as a UID | |
317 | for the variable within this function. This data is needed by the | |
318 | SSA rewriting routines. The pass is located in @file{tree-dfa.c} | |
319 | and is described by @code{pass_referenced_vars}. | |
320 | ||
6de9cd9a DN |
321 | @item Enter static single assignment form |
322 | ||
323 | This pass rewrites the function such that it is in SSA form. After | |
324 | this pass, all @code{is_gimple_reg} variables will be referenced by | |
f0eb93a8 | 325 | @code{SSA_NAME}, and all occurrences of other variables will be |
a6719dc6 | 326 | annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have |
6de9cd9a DN |
327 | been inserted as necessary for each basic block. This pass is |
328 | located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}. | |
329 | ||
330 | @item Warn for uninitialized variables | |
331 | ||
332 | This pass scans the function for uses of @code{SSA_NAME}s that | |
333 | are fed by default definition. For non-parameter variables, such | |
334 | uses are uninitialized. The pass is run twice, before and after | |
2dc74010 | 335 | optimization (if turned on). In the first pass we only warn for uses that are |
6de9cd9a DN |
336 | positively uninitialized; in the second pass we warn for uses that |
337 | are possibly uninitialized. The pass is located in @file{tree-ssa.c} | |
338 | and is defined by @code{pass_early_warn_uninitialized} and | |
339 | @code{pass_late_warn_uninitialized}. | |
340 | ||
341 | @item Dead code elimination | |
342 | ||
343 | This pass scans the function for statements without side effects whose | |
344 | result is unused. It does not do memory life analysis, so any value | |
345 | that is stored in memory is considered used. The pass is run multiple | |
346 | times throughout the optimization process. It is located in | |
347 | @file{tree-ssa-dce.c} and is described by @code{pass_dce}. | |
348 | ||
349 | @item Dominator optimizations | |
350 | ||
351 | This pass performs trivial dominator-based copy and constant propagation, | |
352 | expression simplification, and jump threading. It is run multiple times | |
e4ae5e77 | 353 | throughout the optimization process. It is located in @file{tree-ssa-dom.c} |
6de9cd9a DN |
354 | and is described by @code{pass_dominator}. |
355 | ||
6de9cd9a DN |
356 | @item Forward propagation of single-use variables |
357 | ||
358 | This pass attempts to remove redundant computation by substituting | |
359 | variables that are used once into the expression that uses them and | |
f0eb93a8 | 360 | seeing if the result can be simplified. It is located in |
6de9cd9a DN |
361 | @file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}. |
362 | ||
363 | @item Copy Renaming | |
364 | ||
f0eb93a8 | 365 | This pass attempts to change the name of compiler temporaries involved in |
8a36672b | 366 | copy operations such that SSA->normal can coalesce the copy away. When compiler |
6de9cd9a | 367 | temporaries are copies of user variables, it also renames the compiler |
f0eb93a8 JM |
368 | temporary to the user variable resulting in better use of user symbols. It is |
369 | located in @file{tree-ssa-copyrename.c} and is described by | |
6de9cd9a DN |
370 | @code{pass_copyrename}. |
371 | ||
372 | @item PHI node optimizations | |
373 | ||
a6719dc6 | 374 | This pass recognizes forms of PHI inputs that can be represented as |
6de9cd9a | 375 | conditional expressions and rewrites them into straight line code. |
f0eb93a8 | 376 | It is located in @file{tree-ssa-phiopt.c} and is described by |
6de9cd9a DN |
377 | @code{pass_phiopt}. |
378 | ||
379 | @item May-alias optimization | |
380 | ||
381 | This pass performs a flow sensitive SSA-based points-to analysis. | |
382 | The resulting may-alias, must-alias, and escape analysis information | |
383 | is used to promote variables from in-memory addressable objects to | |
384 | non-aliased variables that can be renamed into SSA form. We also | |
1eaf20ec | 385 | update the @code{VDEF}/@code{VUSE} memory tags for non-renameable |
6de9cd9a DN |
386 | aggregates so that we get fewer false kills. The pass is located |
387 | in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}. | |
388 | ||
a6719dc6 DN |
389 | Interprocedural points-to information is located in |
390 | @file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}. | |
391 | ||
6de9cd9a DN |
392 | @item Profiling |
393 | ||
4df199d1 | 394 | This pass instruments the function in order to collect runtime block |
6de9cd9a DN |
395 | and value profiling data. Such data may be fed back into the compiler |
396 | on a subsequent run so as to allow optimization based on expected | |
4df199d1 JH |
397 | execution frequencies. The pass is located in @file{tree-profile.c} and |
398 | is described by @code{pass_ipa_tree_profile}. | |
399 | ||
400 | @item Static profile estimation | |
401 | ||
402 | This pass implements series of heuristics to guess propababilities | |
403 | of branches. The resulting predictions are turned into edge profile | |
404 | by propagating branches across the control flow graphs. | |
405 | The pass is located in @file{tree-profile.c} and is described by | |
406 | @code{pass_profile}. | |
6de9cd9a DN |
407 | |
408 | @item Lower complex arithmetic | |
409 | ||
410 | This pass rewrites complex arithmetic operations into their component | |
411 | scalar arithmetic operations. The pass is located in @file{tree-complex.c} | |
412 | and is described by @code{pass_lower_complex}. | |
413 | ||
414 | @item Scalar replacement of aggregates | |
415 | ||
416 | This pass rewrites suitable non-aliased local aggregate variables into | |
f0eb93a8 | 417 | a set of scalar variables. The resulting scalar variables are |
6de9cd9a DN |
418 | rewritten into SSA form, which allows subsequent optimization passes |
419 | to do a significantly better job with them. The pass is located in | |
420 | @file{tree-sra.c} and is described by @code{pass_sra}. | |
421 | ||
422 | @item Dead store elimination | |
423 | ||
424 | This pass eliminates stores to memory that are subsequently overwritten | |
425 | by another store, without any intervening loads. The pass is located | |
426 | in @file{tree-ssa-dse.c} and is described by @code{pass_dse}. | |
427 | ||
428 | @item Tail recursion elimination | |
429 | ||
430 | This pass transforms tail recursion into a loop. It is located in | |
431 | @file{tree-tailcall.c} and is described by @code{pass_tail_recursion}. | |
432 | ||
fa555252 DB |
433 | @item Forward store motion |
434 | ||
dc9a511d | 435 | This pass sinks stores and assignments down the flowgraph closer to their |
fa555252 | 436 | use point. The pass is located in @file{tree-ssa-sink.c} and is |
f8912a55 | 437 | described by @code{pass_sink_code}. |
fa555252 | 438 | |
6de9cd9a DN |
439 | @item Partial redundancy elimination |
440 | ||
441 | This pass eliminates partially redundant computations, as well as | |
442 | performing load motion. The pass is located in @file{tree-ssa-pre.c} | |
443 | and is described by @code{pass_pre}. | |
444 | ||
f8912a55 PB |
445 | Just before partial redundancy elimination, if |
446 | @option{-funsafe-math-optimizations} is on, GCC tries to convert | |
447 | divisions to multiplications by the reciprocal. The pass is located | |
448 | in @file{tree-ssa-math-opts.c} and is described by | |
449 | @code{pass_cse_reciprocal}. | |
450 | ||
a6719dc6 DN |
451 | @item Full redundancy elimination |
452 | ||
dc9a511d | 453 | This is a simpler form of PRE that only eliminates redundancies that |
6a953a91 | 454 | occur on all paths. It is located in @file{tree-ssa-pre.c} and |
a6719dc6 DN |
455 | described by @code{pass_fre}. |
456 | ||
6de9cd9a DN |
457 | @item Loop optimization |
458 | ||
a7e5372d ZD |
459 | The main driver of the pass is placed in @file{tree-ssa-loop.c} |
460 | and described by @code{pass_loop}. | |
461 | ||
462 | The optimizations performed by this pass are: | |
463 | ||
464 | Loop invariant motion. This pass moves only invariants that | |
94324dae | 465 | would be hard to handle on RTL level (function calls, operations that expand to |
a7e5372d ZD |
466 | nontrivial sequences of insns). With @option{-funswitch-loops} it also moves |
467 | operands of conditions that are invariant out of the loop, so that we can use | |
468 | just trivial invariantness analysis in loop unswitching. The pass also includes | |
469 | store motion. The pass is implemented in @file{tree-ssa-loop-im.c}. | |
470 | ||
82b85a85 ZD |
471 | Canonical induction variable creation. This pass creates a simple counter |
472 | for number of iterations of the loop and replaces the exit condition of the | |
473 | loop using it, in case when a complicated analysis is necessary to determine | |
474 | the number of iterations. Later optimizations then may determine the number | |
475 | easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}. | |
476 | ||
8b11a64c ZD |
477 | Induction variable optimizations. This pass performs standard induction |
478 | variable optimizations, including strength reduction, induction variable | |
479 | merging and induction variable elimination. The pass is implemented in | |
480 | @file{tree-ssa-loop-ivopts.c}. | |
481 | ||
92fc4a2f ZD |
482 | Loop unswitching. This pass moves the conditional jumps that are invariant |
483 | out of the loops. To achieve this, a duplicate of the loop is created for | |
484 | each possible outcome of conditional jump(s). The pass is implemented in | |
c4597c1d | 485 | @file{tree-ssa-loop-unswitch.c}. |
92fc4a2f | 486 | |
a7e5372d | 487 | The optimizations also use various utility functions contained in |
82b85a85 ZD |
488 | @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and |
489 | @file{cfgloopmanip.c}. | |
73a8ed7e | 490 | |
c9eb94f4 DN |
491 | Vectorization. This pass transforms loops to operate on vector types |
492 | instead of scalar types. Data parallelism across loop iterations is exploited | |
ff2ce160 MS |
493 | to group data elements from consecutive iterations into a vector and operate |
494 | on them in parallel. Depending on available target support the loop is | |
c9eb94f4 | 495 | conceptually unrolled by a factor @code{VF} (vectorization factor), which is |
ff2ce160 | 496 | the number of elements operated upon in parallel in each iteration, and the |
c9eb94f4 DN |
497 | @code{VF} copies of each scalar operation are fused to form a vector operation. |
498 | Additional loop transformations such as peeling and versioning may take place | |
ff2ce160 | 499 | to align the number of iterations, and to align the memory accesses in the |
a70d6342 IR |
500 | loop. |
501 | The pass is implemented in @file{tree-vectorizer.c} (the main driver), | |
ff2ce160 MS |
502 | @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts |
503 | and general loop utilities), @file{tree-vect-slp} (loop-aware SLP | |
a70d6342 | 504 | functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. |
c9eb94f4 DN |
505 | Analysis of data references is in @file{tree-data-ref.c}. |
506 | ||
a70d6342 IR |
507 | SLP Vectorization. This pass performs vectorization of straight-line code. The |
508 | pass is implemented in @file{tree-vectorizer.c} (the main driver), | |
ff2ce160 | 509 | @file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and |
a70d6342 IR |
510 | @file{tree-vect-data-refs.c}. |
511 | ||
5f40b3cb ZD |
512 | Autoparallelization. This pass splits the loop iteration space to run |
513 | into several threads. The pass is implemented in @file{tree-parloops.c}. | |
514 | ||
4e4ee197 SP |
515 | Graphite is a loop transformation framework based on the polyhedral |
516 | model. Graphite stands for Gimple Represented as Polyhedra. The | |
517 | internals of this infrastructure are documented in | |
518 | @w{@uref{http://gcc.gnu.org/wiki/Graphite}}. The passes working on | |
519 | this representation are implemented in the various @file{graphite-*} | |
520 | files. | |
521 | ||
40923b20 DP |
522 | @item Tree level if-conversion for vectorizer |
523 | ||
524 | This pass applies if-conversion to simple loops to help vectorizer. | |
a6719dc6 | 525 | We identify if convertible loops, if-convert statements and merge |
8a36672b | 526 | basic blocks in one big block. The idea is to present loop in such |
40923b20 | 527 | form so that vectorizer can have one to one mapping between statements |
ff2ce160 | 528 | and available vector operations. This pass is located in |
eeb4dfda | 529 | @file{tree-if-conv.c} and is described by @code{pass_if_conversion}. |
40923b20 | 530 | |
6de9cd9a | 531 | @item Conditional constant propagation |
73a8ed7e | 532 | |
6de9cd9a DN |
533 | This pass relaxes a lattice of values in order to identify those |
534 | that must be constant even in the presence of conditional branches. | |
535 | The pass is located in @file{tree-ssa-ccp.c} and is described | |
536 | by @code{pass_ccp}. | |
537 | ||
a6719dc6 DN |
538 | A related pass that works on memory loads and stores, and not just |
539 | register values, is located in @file{tree-ssa-ccp.c} and described by | |
540 | @code{pass_store_ccp}. | |
541 | ||
542 | @item Conditional copy propagation | |
543 | ||
544 | This is similar to constant propagation but the lattice of values is | |
545 | the ``copy-of'' relation. It eliminates redundant copies from the | |
546 | code. The pass is located in @file{tree-ssa-copy.c} and described by | |
547 | @code{pass_copy_prop}. | |
548 | ||
549 | A related pass that works on memory copies, and not just register | |
550 | copies, is located in @file{tree-ssa-copy.c} and described by | |
551 | @code{pass_store_copy_prop}. | |
552 | ||
553 | @item Value range propagation | |
554 | ||
555 | This transformation is similar to constant propagation but | |
556 | instead of propagating single constant values, it propagates | |
557 | known value ranges. The implementation is based on Patterson's | |
558 | range propagation algorithm (Accurate Static Branch Prediction by | |
559 | Value Range Propagation, J. R. C. Patterson, PLDI '95). In | |
560 | contrast to Patterson's algorithm, this implementation does not | |
561 | propagate branch probabilities nor it uses more than a single | |
562 | range per SSA name. This means that the current implementation | |
563 | cannot be used for branch prediction (though adapting it would | |
564 | not be difficult). The pass is located in @file{tree-vrp.c} and is | |
565 | described by @code{pass_vrp}. | |
566 | ||
567 | @item Folding built-in functions | |
6de9cd9a | 568 | |
a6719dc6 | 569 | This pass simplifies built-in functions, as applicable, with constant |
a31cfd58 | 570 | arguments or with inferable string lengths. It is located in |
6de9cd9a DN |
571 | @file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}. |
572 | ||
573 | @item Split critical edges | |
574 | ||
575 | This pass identifies critical edges and inserts empty basic blocks | |
576 | such that the edge is no longer critical. The pass is located in | |
577 | @file{tree-cfg.c} and is described by @code{pass_split_crit_edges}. | |
578 | ||
6de9cd9a DN |
579 | @item Control dependence dead code elimination |
580 | ||
581 | This pass is a stronger form of dead code elimination that can | |
582 | eliminate unnecessary control flow statements. It is located | |
583 | in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}. | |
584 | ||
585 | @item Tail call elimination | |
586 | ||
587 | This pass identifies function calls that may be rewritten into | |
588 | jumps. No code transformation is actually applied here, but the | |
589 | data and control flow problem is solved. The code transformation | |
8a36672b | 590 | requires target support, and so is delayed until RTL@. In the |
6de9cd9a DN |
591 | meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility. |
592 | The pass is located in @file{tree-tailcall.c} and is described by | |
f0eb93a8 | 593 | @code{pass_tail_calls}. The RTL transformation is handled by |
6de9cd9a DN |
594 | @code{fixup_tail_calls} in @file{calls.c}. |
595 | ||
596 | @item Warn for function return without value | |
597 | ||
598 | For non-void functions, this pass locates return statements that do | |
599 | not specify a value and issues a warning. Such a statement may have | |
600 | been injected by falling off the end of the function. This pass is | |
601 | run last so that we have as much time as possible to prove that the | |
602 | statement is not reachable. It is located in @file{tree-cfg.c} and | |
603 | is described by @code{pass_warn_function_return}. | |
604 | ||
6de9cd9a DN |
605 | @item Leave static single assignment form |
606 | ||
607 | This pass rewrites the function such that it is in normal form. At | |
608 | the same time, we eliminate as many single-use temporaries as possible, | |
8a36672b | 609 | so the intermediate language is no longer GIMPLE, but GENERIC@. The |
a6719dc6 DN |
610 | pass is located in @file{tree-outof-ssa.c} and is described by |
611 | @code{pass_del_ssa}. | |
612 | ||
613 | @item Merge PHI nodes that feed into one another | |
614 | ||
615 | This is part of the CFG cleanup passes. It attempts to join PHI nodes | |
616 | from a forwarder CFG block into another block with PHI nodes. The | |
617 | pass is located in @file{tree-cfgcleanup.c} and is described by | |
618 | @code{pass_merge_phi}. | |
619 | ||
620 | @item Return value optimization | |
621 | ||
622 | If a function always returns the same local variable, and that local | |
623 | variable is an aggregate type, then the variable is replaced with the | |
624 | return value for the function (i.e., the function's DECL_RESULT). This | |
625 | is equivalent to the C++ named return value optimization applied to | |
0ee2ea09 | 626 | GIMPLE@. The pass is located in @file{tree-nrv.c} and is described by |
a6719dc6 DN |
627 | @code{pass_nrv}. |
628 | ||
629 | @item Return slot optimization | |
630 | ||
631 | If a function returns a memory object and is called as @code{var = | |
632 | foo()}, this pass tries to change the call so that the address of | |
633 | @code{var} is sent to the caller to avoid an extra memory copy. This | |
634 | pass is located in @code{tree-nrv.c} and is described by | |
635 | @code{pass_return_slot}. | |
636 | ||
637 | @item Optimize calls to @code{__builtin_object_size} | |
638 | ||
639 | This is a propagation pass similar to CCP that tries to remove calls | |
640 | to @code{__builtin_object_size} when the size of the object can be | |
641 | computed at compile-time. This pass is located in | |
642 | @file{tree-object-size.c} and is described by | |
643 | @code{pass_object_sizes}. | |
644 | ||
645 | @item Loop invariant motion | |
646 | ||
647 | This pass removes expensive loop-invariant computations out of loops. | |
648 | The pass is located in @file{tree-ssa-loop.c} and described by | |
649 | @code{pass_lim}. | |
650 | ||
651 | @item Loop nest optimizations | |
652 | ||
653 | This is a family of loop transformations that works on loop nests. It | |
654 | includes loop interchange, scaling, skewing and reversal and they are | |
655 | all geared to the optimization of data locality in array traversals | |
656 | and the removal of dependencies that hamper optimizations such as loop | |
657 | parallelization and vectorization. The pass is located in | |
658 | @file{tree-loop-linear.c} and described by | |
659 | @code{pass_linear_transform}. | |
660 | ||
661 | @item Removal of empty loops | |
662 | ||
663 | This pass removes loops with no code in them. The pass is located in | |
664 | @file{tree-ssa-loop-ivcanon.c} and described by | |
665 | @code{pass_empty_loop}. | |
666 | ||
667 | @item Unrolling of small loops | |
668 | ||
669 | This pass completely unrolls loops with few iterations. The pass | |
670 | is located in @file{tree-ssa-loop-ivcanon.c} and described by | |
671 | @code{pass_complete_unroll}. | |
672 | ||
bbc8a8dc ZD |
673 | @item Predictive commoning |
674 | ||
675 | This pass makes the code reuse the computations from the previous | |
676 | iterations of the loops, especially loads and stores to memory. | |
677 | It does so by storing the values of these computations to a bank | |
678 | of temporary variables that are rotated at the end of loop. To avoid | |
679 | the need for this rotation, the loop is then unrolled and the copies | |
680 | of the loop body are rewritten to use the appropriate version of | |
681 | the temporary variable. This pass is located in @file{tree-predcom.c} | |
682 | and described by @code{pass_predcom}. | |
683 | ||
a6719dc6 DN |
684 | @item Array prefetching |
685 | ||
686 | This pass issues prefetch instructions for array references inside | |
687 | loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and | |
688 | described by @code{pass_loop_prefetch}. | |
689 | ||
690 | @item Reassociation | |
691 | ||
692 | This pass rewrites arithmetic expressions to enable optimizations that | |
693 | operate on them, like redundancy elimination and vectorization. The | |
694 | pass is located in @file{tree-ssa-reassoc.c} and described by | |
695 | @code{pass_reassoc}. | |
696 | ||
697 | @item Optimization of @code{stdarg} functions | |
698 | ||
699 | This pass tries to avoid the saving of register arguments into the | |
700 | stack on entry to @code{stdarg} functions. If the function doesn't | |
701 | use any @code{va_start} macros, no registers need to be saved. If | |
702 | @code{va_start} macros are used, the @code{va_list} variables don't | |
703 | escape the function, it is only necessary to save registers that will | |
704 | be used in @code{va_arg} macros. For instance, if @code{va_arg} is | |
705 | only used with integral types in the function, floating point | |
706 | registers don't need to be saved. This pass is located in | |
707 | @code{tree-stdarg.c} and described by @code{pass_stdarg}. | |
708 | ||
6de9cd9a DN |
709 | @end itemize |
710 | ||
711 | @node RTL passes | |
712 | @section RTL passes | |
713 | ||
94324dae EB |
714 | The following briefly describes the RTL generation and optimization |
715 | passes that are run after the Tree optimization passes. | |
6de9cd9a DN |
716 | |
717 | @itemize @bullet | |
718 | @item RTL generation | |
73a8ed7e JM |
719 | |
720 | @c Avoiding overfull is tricky here. | |
721 | The source files for RTL generation include | |
722 | @file{stmt.c}, | |
723 | @file{calls.c}, | |
724 | @file{expr.c}, | |
725 | @file{explow.c}, | |
726 | @file{expmed.c}, | |
727 | @file{function.c}, | |
728 | @file{optabs.c} | |
729 | and @file{emit-rtl.c}. | |
730 | Also, the file | |
731 | @file{insn-emit.c}, generated from the machine description by the | |
732 | program @code{genemit}, is used in this pass. The header file | |
733 | @file{expr.h} is used for communication within this pass. | |
734 | ||
735 | @findex genflags | |
736 | @findex gencodes | |
737 | The header files @file{insn-flags.h} and @file{insn-codes.h}, | |
738 | generated from the machine description by the programs @code{genflags} | |
739 | and @code{gencodes}, tell this pass which standard names are available | |
740 | for use and which patterns correspond to them. | |
741 | ||
94324dae | 742 | @item Generation of exception landing pads |
73a8ed7e | 743 | |
6de9cd9a DN |
744 | This pass generates the glue that handles communication between the |
745 | exception handling library routines and the exception handlers within | |
746 | the function. Entry points in the function that are invoked by the | |
747 | exception handling library are called @dfn{landing pads}. The code | |
94324dae | 748 | for this pass is located in @file{except.c}. |
73a8ed7e | 749 | |
94324dae | 750 | @item Control flow graph cleanup |
73a8ed7e | 751 | |
6de9cd9a DN |
752 | This pass removes unreachable code, simplifies jumps to next, jumps to |
753 | jump, jumps across jumps, etc. The pass is run multiple times. | |
754 | For historical reasons, it is occasionally referred to as the ``jump | |
f0eb93a8 | 755 | optimization pass''. The bulk of the code for this pass is in |
6de9cd9a DN |
756 | @file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c} |
757 | and @file{jump.c}. | |
73a8ed7e | 758 | |
a52b023a PB |
759 | @item Forward propagation of single-def values |
760 | ||
761 | This pass attempts to remove redundant computation by substituting | |
762 | variables that come from a single definition, and | |
763 | seeing if the result can be simplified. It performs copy propagation | |
764 | and addressing mode selection. The pass is run twice, with values | |
94324dae EB |
765 | being propagated into loops only on the second run. The code is |
766 | located in @file{fwprop.c}. | |
a52b023a | 767 | |
6de9cd9a DN |
768 | @item Common subexpression elimination |
769 | ||
f0eb93a8 | 770 | This pass removes redundant computation within basic blocks, and |
6de9cd9a | 771 | optimizes addressing modes based on cost. The pass is run twice. |
94324dae | 772 | The code for this pass is located in @file{cse.c}. |
6de9cd9a | 773 | |
94324dae | 774 | @item Global common subexpression elimination |
6de9cd9a DN |
775 | |
776 | This pass performs two | |
73a8ed7e JM |
777 | different types of GCSE depending on whether you are optimizing for |
778 | size or not (LCM based GCSE tends to increase code size for a gain in | |
779 | speed, while Morel-Renvoise based GCSE does not). | |
780 | When optimizing for size, GCSE is done using Morel-Renvoise Partial | |
781 | Redundancy Elimination, with the exception that it does not try to move | |
782 | invariants out of loops---that is left to the loop optimization pass. | |
783 | If MR PRE GCSE is done, code hoisting (aka unification) is also done, as | |
784 | well as load motion. | |
785 | If you are optimizing for speed, LCM (lazy code motion) based GCSE is | |
786 | done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM | |
787 | based GCSE also does loop invariant code motion. We also perform load | |
788 | and store motion when optimizing for speed. | |
789 | Regardless of which type of GCSE is used, the GCSE pass also performs | |
790 | global constant and copy propagation. | |
73a8ed7e JM |
791 | The source file for this pass is @file{gcse.c}, and the LCM routines |
792 | are in @file{lcm.c}. | |
793 | ||
6de9cd9a | 794 | @item Loop optimization |
73a8ed7e | 795 | |
efa1cdf0 ZD |
796 | This pass performs several loop related optimizations. |
797 | The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain | |
798 | generic loop analysis and manipulation code. Initialization and finalization | |
799 | of loop structures is handled by @file{loop-init.c}. | |
800 | A loop invariant motion pass is implemented in @file{loop-invariant.c}. | |
c4597c1d TS |
801 | Basic block level optimizations---unrolling, and peeling loops--- |
802 | are implemented in @file{loop-unroll.c}. | |
efa1cdf0 ZD |
803 | Replacing of the exit condition of loops by special machine-dependent |
804 | instructions is handled by @file{loop-doloop.c}. | |
01a132bb | 805 | |
6de9cd9a | 806 | @item Jump bypassing |
73a8ed7e | 807 | |
6de9cd9a DN |
808 | This pass is an aggressive form of GCSE that transforms the control |
809 | flow graph of a function by propagating constants into conditional | |
810 | branch instructions. The source file for this pass is @file{gcse.c}. | |
a0134312 | 811 | |
6de9cd9a | 812 | @item If conversion |
a0134312 | 813 | |
6de9cd9a DN |
814 | This pass attempts to replace conditional branches and surrounding |
815 | assignments with arithmetic, boolean value producing comparison | |
816 | instructions, and conditional move instructions. In the very last | |
55a2c322 | 817 | invocation after reload/LRA, it will generate predicated instructions |
94324dae | 818 | when supported by the target. The code is located in @file{ifcvt.c}. |
a0134312 | 819 | |
6de9cd9a | 820 | @item Web construction |
62551c66 | 821 | |
6de9cd9a DN |
822 | This pass splits independent uses of each pseudo-register. This can |
823 | improve effect of the other transformation, such as CSE or register | |
94324dae | 824 | allocation. The code for this pass is located in @file{web.c}. |
73a8ed7e | 825 | |
6de9cd9a DN |
826 | @item Instruction combination |
827 | ||
828 | This pass attempts to combine groups of two or three instructions that | |
829 | are related by data flow into single instructions. It combines the | |
830 | RTL expressions for the instructions by substitution, simplifies the | |
831 | result using algebra, and then attempts to match the result against | |
94324dae | 832 | the machine description. The code is located in @file{combine.c}. |
6de9cd9a | 833 | |
94324dae | 834 | @item Mode switching optimization |
6de9cd9a DN |
835 | |
836 | This pass looks for instructions that require the processor to be in a | |
837 | specific ``mode'' and minimizes the number of mode changes required to | |
838 | satisfy all users. What these modes are, and what they apply to are | |
94324dae EB |
839 | completely target-specific. The code for this pass is located in |
840 | @file{mode-switching.c}. | |
6de9cd9a | 841 | |
e5626198 AZ |
842 | @cindex modulo scheduling |
843 | @cindex sms, swing, software pipelining | |
f0eb93a8 | 844 | @item Modulo scheduling |
e5626198 | 845 | |
f0eb93a8 JM |
846 | This pass looks at innermost loops and reorders their instructions |
847 | by overlapping different iterations. Modulo scheduling is performed | |
94324dae EB |
848 | immediately before instruction scheduling. The code for this pass is |
849 | located in @file{modulo-sched.c}. | |
e5626198 | 850 | |
6de9cd9a DN |
851 | @item Instruction scheduling |
852 | ||
853 | This pass looks for instructions whose output will not be available by | |
854 | the time that it is used in subsequent instructions. Memory loads and | |
855 | floating point instructions often have this behavior on RISC machines. | |
856 | It re-orders instructions within a basic block to try to separate the | |
857 | definition and use of items that otherwise would cause pipeline | |
858 | stalls. This pass is performed twice, before and after register | |
94324dae | 859 | allocation. The code for this pass is located in @file{haifa-sched.c}, |
6de9cd9a DN |
860 | @file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and |
861 | @file{sched-vis.c}. | |
862 | ||
863 | @item Register allocation | |
864 | ||
865 | These passes make sure that all occurrences of pseudo registers are | |
866 | eliminated, either by allocating them to a hard register, replacing | |
867 | them by an equivalent expression (e.g.@: a constant) or by placing | |
fcb204ce MM |
868 | them on the stack. This is done in several subpasses: |
869 | ||
870 | @itemize @bullet | |
73a8ed7e | 871 | @item |
2af2dbdc | 872 | The integrated register allocator (@acronym{IRA}). It is called |
058e97ec VM |
873 | integrated because coalescing, register live range splitting, and hard |
874 | register preferencing are done on-the-fly during coloring. It also | |
55a2c322 VM |
875 | has better integration with the reload/LRA pass. Pseudo-registers spilled |
876 | by the allocator or the reload/LRA have still a chance to get | |
877 | hard-registers if the reload/LRA evicts some pseudo-registers from | |
058e97ec VM |
878 | hard-registers. The allocator helps to choose better pseudos for |
879 | spilling based on their live ranges and to coalesce stack slots | |
880 | allocated for the spilled pseudo-registers. IRA is a regional | |
881 | register allocator which is transformed into Chaitin-Briggs allocator | |
882 | if there is one region. By default, IRA chooses regions using | |
883 | register pressure but the user can force it to use one region or | |
884 | regions corresponding to all loops. | |
885 | ||
886 | Source files of the allocator are @file{ira.c}, @file{ira-build.c}, | |
887 | @file{ira-costs.c}, @file{ira-conflicts.c}, @file{ira-color.c}, | |
888 | @file{ira-emit.c}, @file{ira-lives}, plus header files @file{ira.h} | |
889 | and @file{ira-int.h} used for the communication between the allocator | |
890 | and the rest of the compiler and between the IRA files. | |
891 | ||
73a8ed7e JM |
892 | @cindex reloading |
893 | @item | |
894 | Reloading. This pass renumbers pseudo registers with the hardware | |
895 | registers numbers they were allocated. Pseudo registers that did not | |
896 | get hard registers are replaced with stack slots. Then it finds | |
897 | instructions that are invalid because a value has failed to end up in | |
898 | a register, or has ended up in a register of the wrong kind. It fixes | |
899 | up these instructions by reloading the problematical values | |
900 | temporarily into registers. Additional instructions are generated to | |
901 | do the copying. | |
902 | ||
903 | The reload pass also optionally eliminates the frame pointer and inserts | |
904 | instructions to save and restore call-clobbered registers around calls. | |
905 | ||
906 | Source files are @file{reload.c} and @file{reload1.c}, plus the header | |
907 | @file{reload.h} used for communication between them. | |
55a2c322 VM |
908 | |
909 | @cindex Local Register Allocator (LRA) | |
910 | @item | |
911 | This pass is a modern replacement of the reload pass. Source files | |
912 | are @file{lra.c}, @file{lra-assign.c}, @file{lra-coalesce.c}, | |
913 | @file{lra-constraints.c}, @file{lra-eliminations.c}, | |
914 | @file{lra-equivs.c}, @file{lra-lives.c}, @file{lra-saves.c}, | |
915 | @file{lra-spills.c}, the header @file{lra-int.h} used for | |
916 | communication between them, and the header @file{lra.h} used for | |
917 | communication between LRA and the rest of compiler. | |
918 | ||
919 | Unlike the reload pass, intermediate LRA decisions are reflected in | |
920 | RTL as much as possible. This reduces the number of target-dependent | |
921 | macros and hooks, leaving instruction constraints as the primary | |
922 | source of control. | |
923 | ||
924 | LRA is run on targets for which TARGET_LRA_P returns true. | |
fcb204ce | 925 | @end itemize |
73a8ed7e | 926 | |
6de9cd9a | 927 | @item Basic block reordering |
73a8ed7e | 928 | |
6de9cd9a DN |
929 | This pass implements profile guided code positioning. If profile |
930 | information is not available, various types of static analysis are | |
931 | performed to make the predictions normally coming from the profile | |
932 | feedback (IE execution frequency, branch probability, etc). It is | |
933 | implemented in the file @file{bb-reorder.c}, and the various | |
934 | prediction routines are in @file{predict.c}. | |
73a8ed7e | 935 | |
6de9cd9a DN |
936 | @item Variable tracking |
937 | ||
938 | This pass computes where the variables are stored at each | |
014a1138 JZ |
939 | position in code and generates notes describing the variable locations |
940 | to RTL code. The location lists are then generated according to these | |
941 | notes to debug information if the debugging information format supports | |
94324dae | 942 | location lists. The code is located in @file{var-tracking.c}. |
014a1138 | 943 | |
6de9cd9a | 944 | @item Delayed branch scheduling |
014a1138 | 945 | |
6de9cd9a | 946 | This optional pass attempts to find instructions that can go into the |
94324dae EB |
947 | delay slots of other instructions, usually jumps and calls. The code |
948 | for this pass is located in @file{reorg.c}. | |
73a8ed7e | 949 | |
6de9cd9a DN |
950 | @item Branch shortening |
951 | ||
952 | On many RISC machines, branch instructions have a limited range. | |
953 | Thus, longer sequences of instructions must be used for long branches. | |
954 | In this pass, the compiler figures out what how far each instruction | |
955 | will be from each other instruction, and therefore whether the usual | |
956 | instructions, or the longer sequences, must be used for each branch. | |
94324dae | 957 | The code for this pass is located in @file{final.c}. |
6de9cd9a DN |
958 | |
959 | @item Register-to-stack conversion | |
73a8ed7e | 960 | |
73a8ed7e JM |
961 | Conversion from usage of some hard registers to usage of a register |
962 | stack may be done at this point. Currently, this is supported only | |
94324dae EB |
963 | for the floating-point registers of the Intel 80387 coprocessor. The |
964 | code for this pass is located in @file{reg-stack.c}. | |
73a8ed7e | 965 | |
6de9cd9a | 966 | @item Final |
73a8ed7e | 967 | |
6de9cd9a DN |
968 | This pass outputs the assembler code for the function. The source files |
969 | are @file{final.c} plus @file{insn-output.c}; the latter is generated | |
970 | automatically from the machine description by the tool @file{genoutput}. | |
971 | The header file @file{conditions.h} is used for communication between | |
98906124 | 972 | these files. |
73a8ed7e | 973 | |
6de9cd9a | 974 | @item Debugging information output |
73a8ed7e | 975 | |
6de9cd9a DN |
976 | This is run after final because it must output the stack slot offsets |
977 | for pseudo registers that did not get hard registers. Source files | |
978 | are @file{dbxout.c} for DBX symbol table format, @file{sdbout.c} for | |
979 | SDB symbol table format, @file{dwarfout.c} for DWARF symbol table | |
980 | format, files @file{dwarf2out.c} and @file{dwarf2asm.c} for DWARF2 | |
981 | symbol table format, and @file{vmsdbgout.c} for VMS debug symbol table | |
982 | format. | |
73a8ed7e | 983 | |
73a8ed7e | 984 | @end itemize |
b1055be0 SS |
985 | |
986 | @node Optimization info | |
987 | @section Optimization info | |
988 | @include optinfo.texi |