]>
Commit | Line | Data |
---|---|---|
346341a4 | 1 | @c Copyright (c) 2006, 2007, 2008 Free Software Foundation, Inc. |
01e0ef5a ZD |
2 | @c Free Software Foundation, Inc. |
3 | @c This is part of the GCC manual. | |
4 | @c For copying conditions, see the file gcc.texi. | |
5 | ||
6 | @c --------------------------------------------------------------------- | |
7 | @c Loop Representation | |
8 | @c --------------------------------------------------------------------- | |
9 | ||
821f6218 | 10 | @node Loop Analysis and Representation |
01e0ef5a ZD |
11 | @chapter Analysis and Representation of Loops |
12 | ||
13 | GCC provides extensive infrastructure for work with natural loops, i.e., | |
14 | strongly connected components of CFG with only one entry block. This | |
15 | chapter describes representation of loops in GCC, both on GIMPLE and in | |
16 | RTL, as well as the interfaces to loop-related analyses (induction | |
17 | variable analysis and number of iterations analysis). | |
18 | ||
19 | @menu | |
6ccde948 RW |
20 | * Loop representation:: Representation and analysis of loops. |
21 | * Loop querying:: Getting information about loops. | |
22 | * Loop manipulation:: Loop manipulation functions. | |
23 | * LCSSA:: Loop-closed SSA form. | |
24 | * Scalar evolutions:: Induction variables on GIMPLE. | |
25 | * loop-iv:: Induction variables on RTL. | |
26 | * Number of iterations:: Number of iterations analysis. | |
27 | * Dependency analysis:: Data dependency analysis. | |
28 | * Lambda:: Linear loop transformations framework. | |
29 | * Omega:: A solver for linear programming problems. | |
01e0ef5a ZD |
30 | @end menu |
31 | ||
32 | @node Loop representation | |
33 | @section Loop representation | |
34 | @cindex Loop representation | |
35 | @cindex Loop analysis | |
36 | ||
37 | This chapter describes the representation of loops in GCC, and functions | |
38 | that can be used to build, modify and analyze this representation. Most | |
39 | of the interfaces and data structures are declared in @file{cfgloop.h}. | |
40 | At the moment, loop structures are analyzed and this information is | |
41 | updated only by the optimization passes that deal with loops, but some | |
42 | efforts are being made to make it available throughout most of the | |
43 | optimization passes. | |
44 | ||
45 | In general, a natural loop has one entry block (header) and possibly | |
46 | several back edges (latches) leading to the header from the inside of | |
47 | the loop. Loops with several latches may appear if several loops share | |
48 | a single header, or if there is a branching in the middle of the loop. | |
49 | The representation of loops in GCC however allows only loops with a | |
50 | single latch. During loop analysis, headers of such loops are split and | |
51 | forwarder blocks are created in order to disambiguate their structures. | |
89f8f30f ZD |
52 | Heuristic based on profile information and structure of the induction |
53 | variables in the loops is used to determine whether the latches | |
54 | correspond to sub-loops or to control flow in a single loop. This means | |
55 | that the analysis sometimes changes the CFG, and if you run it in the | |
56 | middle of an optimization pass, you must be able to deal with the new | |
57 | blocks. You may avoid CFG changes by passing | |
58 | @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} flag to the loop discovery, | |
59 | note however that most other loop manipulation functions will not work | |
60 | correctly for loops with multiple latch edges (the functions that only | |
61 | query membership of blocks to loops and subloop relationships, or | |
62 | enumerate and test loop exits, can be expected to work). | |
01e0ef5a ZD |
63 | |
64 | Body of the loop is the set of blocks that are dominated by its header, | |
0ee2ea09 | 65 | and reachable from its latch against the direction of edges in CFG@. The |
01e0ef5a ZD |
66 | loops are organized in a containment hierarchy (tree) such that all the |
67 | loops immediately contained inside loop L are the children of L in the | |
68 | tree. This tree is represented by the @code{struct loops} structure. | |
69 | The root of this tree is a fake loop that contains all blocks in the | |
70 | function. Each of the loops is represented in a @code{struct loop} | |
71 | structure. Each loop is assigned an index (@code{num} field of the | |
72 | @code{struct loop} structure), and the pointer to the loop is stored in | |
42fd6772 | 73 | the corresponding field of the @code{larray} vector in the loops |
677cc14d | 74 | structure. The indices do not have to be continuous, there may be |
42fd6772 | 75 | empty (@code{NULL}) entries in the @code{larray} created by deleting |
677cc14d ZD |
76 | loops. Also, there is no guarantee on the relative order of a loop |
77 | and its subloops in the numbering. The index of a loop never changes. | |
42fd6772 ZD |
78 | |
79 | The entries of the @code{larray} field should not be accessed directly. | |
80 | The function @code{get_loop} returns the loop description for a loop with | |
81 | the given index. @code{number_of_loops} function returns number of | |
82 | loops in the function. To traverse all loops, use @code{FOR_EACH_LOOP} | |
83 | macro. The @code{flags} argument of the macro is used to determine | |
677cc14d ZD |
84 | the direction of traversal and the set of loops visited. Each loop is |
85 | guaranteed to be visited exactly once, regardless of the changes to the | |
86 | loop tree, and the loops may be removed during the traversal. The newly | |
87 | created loops are never traversed, if they need to be visited, this | |
88 | must be done separately after their creation. The @code{FOR_EACH_LOOP} | |
89 | macro allocates temporary variables. If the @code{FOR_EACH_LOOP} loop | |
90 | were ended using break or goto, they would not be released; | |
91 | @code{FOR_EACH_LOOP_BREAK} macro must be used instead. | |
01e0ef5a ZD |
92 | |
93 | Each basic block contains the reference to the innermost loop it belongs | |
94 | to (@code{loop_father}). For this reason, it is only possible to have | |
95 | one @code{struct loops} structure initialized at the same time for each | |
0ee2ea09 | 96 | CFG@. The global variable @code{current_loops} contains the |
42fd6772 ZD |
97 | @code{struct loops} structure. Many of the loop manipulation functions |
98 | assume that dominance information is up-to-date. | |
01e0ef5a ZD |
99 | |
100 | The loops are analyzed through @code{loop_optimizer_init} function. The | |
101 | argument of this function is a set of flags represented in an integer | |
102 | bitmask. These flags specify what other properties of the loop | |
103 | structures should be calculated/enforced and preserved later: | |
104 | ||
105 | @itemize | |
89f8f30f ZD |
106 | @item @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES}: If this flag is set, no |
107 | changes to CFG will be performed in the loop analysis, in particular, | |
108 | loops with multiple latch edges will not be disambiguated. If a loop | |
0ee2ea09 | 109 | has multiple latches, its latch block is set to NULL@. Most of |
89f8f30f ZD |
110 | the loop manipulation functions will not work for loops in this shape. |
111 | No other flags that require CFG changes can be passed to | |
112 | loop_optimizer_init. | |
01e0ef5a ZD |
113 | @item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such |
114 | a way that each loop has only one entry edge, and additionally, the | |
115 | source block of this entry edge has only one successor. This creates a | |
116 | natural place where the code can be moved out of the loop, and ensures | |
117 | that the entry edge of the loop leads from its immediate super-loop. | |
118 | @item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to | |
119 | force the latch block of each loop to have only one successor. This | |
120 | ensures that the latch of the loop does not belong to any of its | |
121 | sub-loops, and makes manipulation with the loops significantly easier. | |
122 | Most of the loop manipulation functions assume that the loops are in | |
123 | this shape. Note that with this flag, the ``normal'' loop without any | |
124 | control flow inside and with one exit consists of two basic blocks. | |
125 | @item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and | |
126 | edges in the strongly connected components that are not natural loops | |
127 | (have more than one entry block) are marked with | |
128 | @code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags. The | |
129 | flag is not set for blocks and edges that belong to natural loops that | |
130 | are in such an irreducible region (but it is set for the entry and exit | |
131 | edges of such a loop, if they lead to/from this region). | |
6270df4c ZD |
132 | @item @code{LOOPS_HAVE_RECORDED_EXITS}: The lists of exits are recorded |
133 | and updated for each loop. This makes some functions (e.g., | |
134 | @code{get_loop_exit_edges}) more efficient. Some functions (e.g., | |
135 | @code{single_exit}) can be used only if the lists of exits are | |
136 | recorded. | |
01e0ef5a ZD |
137 | @end itemize |
138 | ||
139 | These properties may also be computed/enforced later, using functions | |
140 | @code{create_preheaders}, @code{force_single_succ_latches}, | |
6270df4c | 141 | @code{mark_irreducible_loops} and @code{record_loop_exits}. |
01e0ef5a ZD |
142 | |
143 | The memory occupied by the loops structures should be freed with | |
144 | @code{loop_optimizer_finalize} function. | |
145 | ||
146 | The CFG manipulation functions in general do not update loop structures. | |
147 | Specialized versions that additionally do so are provided for the most | |
148 | common tasks. On GIMPLE, @code{cleanup_tree_cfg_loop} function can be | |
149 | used to cleanup CFG while updating the loops structures if | |
150 | @code{current_loops} is set. | |
151 | ||
152 | @node Loop querying | |
153 | @section Loop querying | |
154 | @cindex Loop querying | |
155 | ||
156 | The functions to query the information about loops are declared in | |
157 | @file{cfgloop.h}. Some of the information can be taken directly from | |
158 | the structures. @code{loop_father} field of each basic block contains | |
159 | the innermost loop to that the block belongs. The most useful fields of | |
160 | loop structure (that are kept up-to-date at all times) are: | |
161 | ||
162 | @itemize | |
163 | @item @code{header}, @code{latch}: Header and latch basic blocks of the | |
164 | loop. | |
165 | @item @code{num_nodes}: Number of basic blocks in the loop (including | |
166 | the basic blocks of the sub-loops). | |
167 | @item @code{depth}: The depth of the loop in the loops tree, i.e., the | |
168 | number of super-loops of the loop. | |
169 | @item @code{outer}, @code{inner}, @code{next}: The super-loop, the first | |
170 | sub-loop, and the sibling of the loop in the loops tree. | |
01e0ef5a ZD |
171 | @end itemize |
172 | ||
173 | There are other fields in the loop structures, many of them used only by | |
174 | some of the passes, or not updated during CFG changes; in general, they | |
175 | should not be accessed directly. | |
176 | ||
177 | The most important functions to query loop structures are: | |
178 | ||
179 | @itemize | |
180 | @item @code{flow_loops_dump}: Dumps the information about loops to a | |
181 | file. | |
182 | @item @code{verify_loop_structure}: Checks consistency of the loop | |
183 | structures. | |
184 | @item @code{loop_latch_edge}: Returns the latch edge of a loop. | |
185 | @item @code{loop_preheader_edge}: If loops have preheaders, returns | |
186 | the preheader edge of a loop. | |
187 | @item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of | |
188 | another loop. | |
189 | @item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs | |
190 | to a loop (including its sub-loops). | |
191 | @item @code{find_common_loop}: Finds the common super-loop of two loops. | |
192 | @item @code{superloop_at_depth}: Returns the super-loop of a loop with | |
193 | the given depth. | |
194 | @item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the | |
195 | number of insns in the loop, on GIMPLE and on RTL. | |
196 | @item @code{loop_exit_edge_p}: Tests whether edge is an exit from a | |
197 | loop. | |
198 | @item @code{mark_loop_exit_edges}: Marks all exit edges of all loops | |
199 | with @code{EDGE_LOOP_EXIT} flag. | |
200 | @item @code{get_loop_body}, @code{get_loop_body_in_dom_order}, | |
201 | @code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the | |
202 | loop in depth-first search order in reversed CFG, ordered by dominance | |
203 | relation, and breath-first search order, respectively. | |
ac8f6c69 ZD |
204 | @item @code{single_exit}: Returns the single exit edge of the loop, or |
205 | @code{NULL} if the loop has more than one exit. You can only use this | |
206 | function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used. | |
01e0ef5a ZD |
207 | @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop. |
208 | @item @code{just_once_each_iteration_p}: Returns true if the basic block | |
209 | is executed exactly once during each iteration of a loop (that is, it | |
210 | does not belong to a sub-loop, and it dominates the latch of the loop). | |
211 | @end itemize | |
212 | ||
213 | @node Loop manipulation | |
214 | @section Loop manipulation | |
215 | @cindex Loop manipulation | |
216 | ||
217 | The loops tree can be manipulated using the following functions: | |
218 | ||
219 | @itemize | |
220 | @item @code{flow_loop_tree_node_add}: Adds a node to the tree. | |
221 | @item @code{flow_loop_tree_node_remove}: Removes a node from the tree. | |
222 | @item @code{add_bb_to_loop}: Adds a basic block to a loop. | |
223 | @item @code{remove_bb_from_loops}: Removes a basic block from loops. | |
224 | @end itemize | |
225 | ||
598ec7bd ZD |
226 | Most low-level CFG functions update loops automatically. The following |
227 | functions handle some more complicated cases of CFG manipulations: | |
01e0ef5a ZD |
228 | |
229 | @itemize | |
01e0ef5a | 230 | @item @code{remove_path}: Removes an edge and all blocks it dominates. |
01e0ef5a ZD |
231 | @item @code{split_loop_exit_edge}: Splits exit edge of the loop, |
232 | ensuring that PHI node arguments remain in the loop (this ensures that | |
233 | loop-closed SSA form is preserved). Only useful on GIMPLE. | |
234 | @end itemize | |
235 | ||
236 | Finally, there are some higher-level loop transformations implemented. | |
237 | While some of them are written so that they should work on non-innermost | |
238 | loops, they are mostly untested in that case, and at the moment, they | |
239 | are only reliable for the innermost loops: | |
240 | ||
241 | @itemize | |
242 | @item @code{create_iv}: Creates a new induction variable. Only works on | |
0ee2ea09 | 243 | GIMPLE@. @code{standard_iv_increment_position} can be used to find a |
01e0ef5a ZD |
244 | suitable place for the iv increment. |
245 | @item @code{duplicate_loop_to_header_edge}, | |
246 | @code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and | |
247 | on GIMPLE) duplicate the body of the loop prescribed number of times on | |
248 | one of the edges entering loop header, thus performing either loop | |
249 | unrolling or loop peeling. @code{can_duplicate_loop_p} | |
250 | (@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated | |
251 | loop. | |
252 | @item @code{loop_version}, @code{tree_ssa_loop_version}: These function | |
253 | create a copy of a loop, and a branch before them that selects one of | |
254 | them depending on the prescribed condition. This is useful for | |
255 | optimizations that need to verify some assumptions in runtime (one of | |
256 | the copies of the loop is usually left unchanged, while the other one is | |
257 | transformed in some way). | |
258 | @item @code{tree_unroll_loop}: Unrolls the loop, including peeling the | |
259 | extra iterations to make the number of iterations divisible by unroll | |
260 | factor, updating the exit condition, and removing the exits that now | |
261 | cannot be taken. Works only on GIMPLE. | |
262 | @end itemize | |
263 | ||
264 | @node LCSSA | |
265 | @section Loop-closed SSA form | |
266 | @cindex LCSSA | |
267 | @cindex Loop-closed SSA form | |
268 | ||
269 | Throughout the loop optimizations on tree level, one extra condition is | |
270 | enforced on the SSA form: No SSA name is used outside of the loop in | |
271 | that it is defined. The SSA form satisfying this condition is called | |
0ee2ea09 | 272 | ``loop-closed SSA form'' -- LCSSA@. To enforce LCSSA, PHI nodes must be |
01e0ef5a ZD |
273 | created at the exits of the loops for the SSA names that are used |
274 | outside of them. Only the real operands (not virtual SSA names) are | |
275 | held in LCSSA, in order to save memory. | |
276 | ||
277 | There are various benefits of LCSSA: | |
278 | ||
279 | @itemize | |
280 | @item Many optimizations (value range analysis, final value | |
281 | replacement) are interested in the values that are defined in the loop | |
282 | and used outside of it, i.e., exactly those for that we create new PHI | |
283 | nodes. | |
284 | @item In induction variable analysis, it is not necessary to specify the | |
285 | loop in that the analysis should be performed -- the scalar evolution | |
286 | analysis always returns the results with respect to the loop in that the | |
287 | SSA name is defined. | |
288 | @item It makes updating of SSA form during loop transformations simpler. | |
289 | Without LCSSA, operations like loop unrolling may force creation of PHI | |
290 | nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be | |
291 | updated locally. However, since we only keep real operands in LCSSA, we | |
292 | cannot use this advantage (we could have local updating of real | |
293 | operands, but it is not much more efficient than to use generic SSA form | |
294 | updating for it as well; the amount of changes to SSA is the same). | |
295 | @end itemize | |
296 | ||
297 | However, it also means LCSSA must be updated. This is usually | |
298 | straightforward, unless you create a new value in loop and use it | |
299 | outside, or unless you manipulate loop exit edges (functions are | |
300 | provided to make these manipulations simple). | |
301 | @code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to | |
302 | LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of | |
303 | LCSSA is preserved. | |
304 | ||
305 | @node Scalar evolutions | |
306 | @section Scalar evolutions | |
307 | @cindex Scalar evolutions | |
308 | @cindex IV analysis on GIMPLE | |
309 | ||
310 | Scalar evolutions (SCEV) are used to represent results of induction | |
0ee2ea09 | 311 | variable analysis on GIMPLE@. They enable us to represent variables with |
01e0ef5a ZD |
312 | complicated behavior in a simple and consistent way (we only use it to |
313 | express values of polynomial induction variables, but it is possible to | |
314 | extend it). The interfaces to SCEV analysis are declared in | |
315 | @file{tree-scalar-evolution.h}. To use scalar evolutions analysis, | |
316 | @code{scev_initialize} must be used. To stop using SCEV, | |
317 | @code{scev_finalize} should be used. SCEV analysis caches results in | |
318 | order to save time and memory. This cache however is made invalid by | |
319 | most of the loop transformations, including removal of code. If such a | |
320 | transformation is performed, @code{scev_reset} must be called to clean | |
321 | the caches. | |
322 | ||
323 | Given an SSA name, its behavior in loops can be analyzed using the | |
324 | @code{analyze_scalar_evolution} function. The returned SCEV however | |
325 | does not have to be fully analyzed and it may contain references to | |
326 | other SSA names defined in the loop. To resolve these (potentially | |
327 | recursive) references, @code{instantiate_parameters} or | |
328 | @code{resolve_mixers} functions must be used. | |
329 | @code{instantiate_parameters} is useful when you use the results of SCEV | |
330 | only for some analysis, and when you work with whole nest of loops at | |
331 | once. It will try replacing all SSA names by their SCEV in all loops, | |
332 | including the super-loops of the current loop, thus providing a complete | |
333 | information about the behavior of the variable in the loop nest. | |
334 | @code{resolve_mixers} is useful if you work with only one loop at a | |
335 | time, and if you possibly need to create code based on the value of the | |
336 | induction variable. It will only resolve the SSA names defined in the | |
337 | current loop, leaving the SSA names defined outside unchanged, even if | |
338 | their evolution in the outer loops is known. | |
339 | ||
340 | The SCEV is a normal tree expression, except for the fact that it may | |
341 | contain several special tree nodes. One of them is | |
342 | @code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be | |
343 | expressed. The other one is @code{POLYNOMIAL_CHREC}. Polynomial chrec | |
344 | has three arguments -- base, step and loop (both base and step may | |
345 | contain further polynomial chrecs). Type of the expression and of base | |
346 | and step must be the same. A variable has evolution | |
347 | @code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified | |
348 | loop) equivalent to @code{x_1} in the following example | |
349 | ||
350 | @smallexample | |
923158be | 351 | while (@dots{}) |
01e0ef5a ZD |
352 | @{ |
353 | x_1 = phi (base, x_2); | |
354 | x_2 = x_1 + step; | |
355 | @} | |
356 | @end smallexample | |
357 | ||
358 | Note that this includes the language restrictions on the operations. | |
359 | For example, if we compile C code and @code{x} has signed type, then the | |
360 | overflow in addition would cause undefined behavior, and we may assume | |
361 | that this does not happen. Hence, the value with this SCEV cannot | |
362 | overflow (which restricts the number of iterations of such a loop). | |
363 | ||
364 | In many cases, one wants to restrict the attention just to affine | |
365 | induction variables. In this case, the extra expressive power of SCEV | |
366 | is not useful, and may complicate the optimizations. In this case, | |
367 | @code{simple_iv} function may be used to analyze a value -- the result | |
368 | is a loop-invariant base and step. | |
369 | ||
370 | @node loop-iv | |
371 | @section IV analysis on RTL | |
372 | @cindex IV analysis on RTL | |
373 | ||
374 | The induction variable on RTL is simple and only allows analysis of | |
375 | affine induction variables, and only in one loop at once. The interface | |
376 | is declared in @file{cfgloop.h}. Before analyzing induction variables | |
377 | in a loop L, @code{iv_analysis_loop_init} function must be called on L. | |
378 | After the analysis (possibly calling @code{iv_analysis_loop_init} for | |
379 | several loops) is finished, @code{iv_analysis_done} should be called. | |
380 | The following functions can be used to access the results of the | |
381 | analysis: | |
382 | ||
383 | @itemize | |
384 | @item @code{iv_analyze}: Analyzes a single register used in the given | |
385 | insn. If no use of the register in this insn is found, the following | |
386 | insns are scanned, so that this function can be called on the insn | |
387 | returned by get_condition. | |
388 | @item @code{iv_analyze_result}: Analyzes result of the assignment in the | |
389 | given insn. | |
390 | @item @code{iv_analyze_expr}: Analyzes a more complicated expression. | |
391 | All its operands are analyzed by @code{iv_analyze}, and hence they must | |
392 | be used in the specified insn or one of the following insns. | |
393 | @end itemize | |
394 | ||
395 | The description of the induction variable is provided in @code{struct | |
396 | rtx_iv}. In order to handle subregs, the representation is a bit | |
397 | complicated; if the value of the @code{extend} field is not | |
398 | @code{UNKNOWN}, the value of the induction variable in the i-th | |
399 | iteration is | |
400 | ||
401 | @smallexample | |
402 | delta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)), | |
403 | @end smallexample | |
404 | ||
405 | with the following exception: if @code{first_special} is true, then the | |
406 | value in the first iteration (when @code{i} is zero) is @code{delta + | |
407 | mult * base}. However, if @code{extend} is equal to @code{UNKNOWN}, | |
408 | then @code{first_special} must be false, @code{delta} 0, @code{mult} 1 | |
409 | and the value in the i-th iteration is | |
410 | ||
411 | @smallexample | |
412 | subreg_@{mode@} (base + i * step) | |
413 | @end smallexample | |
414 | ||
415 | The function @code{get_iv_value} can be used to perform these | |
416 | calculations. | |
417 | ||
418 | @node Number of iterations | |
419 | @section Number of iterations analysis | |
420 | @cindex Number of iterations analysis | |
421 | ||
422 | Both on GIMPLE and on RTL, there are functions available to determine | |
a14865db ZD |
423 | the number of iterations of a loop, with a similar interface. The |
424 | number of iterations of a loop in GCC is defined as the number of | |
425 | executions of the loop latch. In many cases, it is not possible to | |
426 | determine the number of iterations unconditionally -- the determined | |
427 | number is correct only if some assumptions are satisfied. The analysis | |
428 | tries to verify these conditions using the information contained in the | |
429 | program; if it fails, the conditions are returned together with the | |
430 | result. The following information and conditions are provided by the | |
431 | analysis: | |
01e0ef5a ZD |
432 | |
433 | @itemize | |
434 | @item @code{assumptions}: If this condition is false, the rest of | |
435 | the information is invalid. | |
436 | @item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If | |
437 | this condition is true, the loop exits in the first iteration. | |
438 | @item @code{infinite}: If this condition is true, the loop is infinite. | |
0ee2ea09 | 439 | This condition is only available on RTL@. On GIMPLE, conditions for |
01e0ef5a ZD |
440 | finiteness of the loop are included in @code{assumptions}. |
441 | @item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression | |
442 | that gives number of iterations. The number of iterations is defined as | |
443 | the number of executions of the loop latch. | |
444 | @end itemize | |
445 | ||
446 | Both on GIMPLE and on RTL, it necessary for the induction variable | |
447 | analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL). | |
448 | On GIMPLE, the results are stored to @code{struct tree_niter_desc} | |
449 | structure. Number of iterations before the loop is exited through a | |
450 | given exit can be determined using @code{number_of_iterations_exit} | |
451 | function. On RTL, the results are returned in @code{struct niter_desc} | |
452 | structure. The corresponding function is named | |
453 | @code{check_simple_exit}. There are also functions that pass through | |
454 | all the exits of a loop and try to find one with easy to determine | |
455 | number of iterations -- @code{find_loop_niter} on GIMPLE and | |
0ee2ea09 | 456 | @code{find_simple_exit} on RTL@. Finally, there are functions that |
01e0ef5a ZD |
457 | provide the same information, but additionally cache it, so that |
458 | repeated calls to number of iterations are not so costly -- | |
a14865db ZD |
459 | @code{number_of_latch_executions} on GIMPLE and @code{get_simple_loop_desc} |
460 | on RTL. | |
01e0ef5a ZD |
461 | |
462 | Note that some of these functions may behave slightly differently than | |
463 | others -- some of them return only the expression for the number of | |
464 | iterations, and fail if there are some assumptions. The function | |
a14865db ZD |
465 | @code{number_of_latch_executions} works only for single-exit loops. |
466 | The function @code{number_of_cond_exit_executions} can be used to | |
467 | determine number of executions of the exit condition of a single-exit | |
468 | loop (i.e., the @code{number_of_latch_executions} increased by one). | |
01e0ef5a ZD |
469 | |
470 | @node Dependency analysis | |
471 | @section Data Dependency Analysis | |
472 | @cindex Data Dependency Analysis | |
473 | ||
474 | The code for the data dependence analysis can be found in | |
475 | @file{tree-data-ref.c} and its interface and data structures are | |
476 | described in @file{tree-data-ref.h}. The function that computes the | |
477 | data dependences for all the array and pointer references for a given | |
478 | loop is @code{compute_data_dependences_for_loop}. This function is | |
479 | currently used by the linear loop transform and the vectorization | |
480 | passes. Before calling this function, one has to allocate two vectors: | |
481 | a first vector will contain the set of data references that are | |
482 | contained in the analyzed loop body, and the second vector will contain | |
483 | the dependence relations between the data references. Thus if the | |
484 | vector of data references is of size @code{n}, the vector containing the | |
485 | dependence relations will contain @code{n*n} elements. However if the | |
486 | analyzed loop contains side effects, such as calls that potentially can | |
487 | interfere with the data references in the current analyzed loop, the | |
488 | analysis stops while scanning the loop body for data references, and | |
489 | inserts a single @code{chrec_dont_know} in the dependence relation | |
490 | array. | |
491 | ||
492 | The data references are discovered in a particular order during the | |
493 | scanning of the loop body: the loop body is analyzed in execution order, | |
494 | and the data references of each statement are pushed at the end of the | |
495 | data reference array. Two data references syntactically occur in the | |
496 | program in the same order as in the array of data references. This | |
497 | syntactic order is important in some classical data dependence tests, | |
498 | and mapping this order to the elements of this array avoids costly | |
499 | queries to the loop body representation. | |
500 | ||
7251a8d1 | 501 | Three types of data references are currently handled: ARRAY_REF, |
0ee2ea09 | 502 | INDIRECT_REF and COMPONENT_REF@. The data structure for the data reference |
7251a8d1 IR |
503 | is @code{data_reference}, where @code{data_reference_p} is a name of a |
504 | pointer to the data reference structure. The structure contains the | |
505 | following elements: | |
506 | ||
507 | @itemize | |
508 | @item @code{base_object_info}: Provides information about the base object | |
509 | of the data reference and its access functions. These access functions | |
510 | represent the evolution of the data reference in the loop relative to | |
511 | its base, in keeping with the classical meaning of the data reference | |
512 | access function for the support of arrays. For example, for a reference | |
513 | @code{a.b[i][j]}, the base object is @code{a.b} and the access functions, | |
514 | one for each array subscript, are: | |
515 | @code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}. | |
516 | ||
517 | @item @code{first_location_in_loop}: Provides information about the first | |
518 | location accessed by the data reference in the loop and about the access | |
519 | function used to represent evolution relative to this location. This data | |
520 | is used to support pointers, and is not used for arrays (for which we | |
521 | have base objects). Pointer accesses are represented as a one-dimensional | |
522 | access that starts from the first location accessed in the loop. For | |
523 | example: | |
524 | ||
525 | @smallexample | |
526 | for1 i | |
527 | for2 j | |
528 | *((int *)p + i + j) = a[i][j]; | |
529 | @end smallexample | |
530 | ||
531 | The access function of the pointer access is @code{@{0, + 4B@}_for2} | |
532 | relative to @code{p + i}. The access functions of the array are | |
533 | @code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2} | |
534 | relative to @code{a}. | |
535 | ||
536 | Usually, the object the pointer refers to is either unknown, or we can't | |
537 | prove that the access is confined to the boundaries of a certain object. | |
538 | ||
539 | Two data references can be compared only if at least one of these two | |
540 | representations has all its fields filled for both data references. | |
541 | ||
542 | The current strategy for data dependence tests is as follows: | |
543 | If both @code{a} and @code{b} are represented as arrays, compare | |
544 | @code{a.base_object} and @code{b.base_object}; | |
545 | if they are equal, apply dependence tests (use access functions based on | |
546 | base_objects). | |
547 | Else if both @code{a} and @code{b} are represented as pointers, compare | |
548 | @code{a.first_location} and @code{b.first_location}; | |
549 | if they are equal, apply dependence tests (use access functions based on | |
550 | first location). | |
551 | However, if @code{a} and @code{b} are represented differently, only try | |
552 | to prove that the bases are definitely different. | |
553 | ||
554 | @item Aliasing information. | |
555 | @item Alignment information. | |
556 | @end itemize | |
557 | ||
01e0ef5a ZD |
558 | The structure describing the relation between two data references is |
559 | @code{data_dependence_relation} and the shorter name for a pointer to | |
560 | such a structure is @code{ddr_p}. This structure contains: | |
561 | ||
562 | @itemize | |
563 | @item a pointer to each data reference, | |
564 | @item a tree node @code{are_dependent} that is set to @code{chrec_known} | |
565 | if the analysis has proved that there is no dependence between these two | |
566 | data references, @code{chrec_dont_know} if the analysis was not able to | |
567 | determine any useful result and potentially there could exist a | |
568 | dependence between these data references, and @code{are_dependent} is | |
569 | set to @code{NULL_TREE} if there exist a dependence relation between the | |
570 | data references, and the description of this dependence relation is | |
571 | given in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects} | |
572 | arrays, | |
573 | @item a boolean that determines whether the dependence relation can be | |
574 | represented by a classical distance vector, | |
575 | @item an array @code{subscripts} that contains a description of each | |
576 | subscript of the data references. Given two array accesses a | |
577 | subscript is the tuple composed of the access functions for a given | |
578 | dimension. For example, given @code{A[f1][f2][f3]} and | |
579 | @code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2, | |
580 | g2), (f3, g3)}. | |
581 | @item two arrays @code{dir_vects} and @code{dist_vects} that contain | |
582 | classical representations of the data dependences under the form of | |
583 | direction and distance dependence vectors, | |
584 | @item an array of loops @code{loop_nest} that contains the loops to | |
585 | which the distance and direction vectors refer to. | |
586 | @end itemize | |
587 | ||
588 | Several functions for pretty printing the information extracted by the | |
589 | data dependence analysis are available: @code{dump_ddrs} prints with a | |
590 | maximum verbosity the details of a data dependence relations array, | |
591 | @code{dump_dist_dir_vectors} prints only the classical distance and | |
592 | direction vectors for a data dependence relations array, and | |
593 | @code{dump_data_references} prints the details of the data references | |
594 | contained in a data reference array. | |
595 | ||
596 | @node Lambda | |
597 | @section Linear loop transformations framework | |
598 | @cindex Linear loop transformations framework | |
599 | ||
600 | Lambda is a framework that allows transformations of loops using | |
601 | non-singular matrix based transformations of the iteration space and | |
602 | loop bounds. This allows compositions of skewing, scaling, interchange, | |
603 | and reversal transformations. These transformations are often used to | |
604 | improve cache behavior or remove inner loop dependencies to allow | |
605 | parallelization and vectorization to take place. | |
606 | ||
607 | To perform these transformations, Lambda requires that the loopnest be | |
608 | converted into a internal form that can be matrix transformed easily. | |
609 | To do this conversion, the function | |
610 | @code{gcc_loopnest_to_lambda_loopnest} is provided. If the loop cannot | |
611 | be transformed using lambda, this function will return NULL. | |
612 | ||
613 | Once a @code{lambda_loopnest} is obtained from the conversion function, | |
614 | it can be transformed by using @code{lambda_loopnest_transform}, which | |
615 | takes a transformation matrix to apply. Note that it is up to the | |
616 | caller to verify that the transformation matrix is legal to apply to the | |
617 | loop (dependence respecting, etc). Lambda simply applies whatever | |
618 | matrix it is told to provide. It can be extended to make legal matrices | |
619 | out of any non-singular matrix, but this is not currently implemented. | |
620 | Legality of a matrix for a given loopnest can be verified using | |
621 | @code{lambda_transform_legal_p}. | |
622 | ||
623 | Given a transformed loopnest, conversion back into gcc IR is done by | |
624 | @code{lambda_loopnest_to_gcc_loopnest}. This function will modify the | |
625 | loops so that they match the transformed loopnest. | |
626 | ||
3d8864c0 SP |
627 | |
628 | @node Omega | |
629 | @section Omega a solver for linear programming problems | |
630 | @cindex Omega a solver for linear programming problems | |
631 | ||
632 | The data dependence analysis contains several solvers triggered | |
633 | sequentially from the less complex ones to the more sophisticated. | |
634 | For ensuring the consistency of the results of these solvers, a data | |
635 | dependence check pass has been implemented based on two different | |
636 | solvers. The second method that has been integrated to GCC is based | |
637 | on the Omega dependence solver, written in the 1990's by William Pugh | |
638 | and David Wonnacott. Data dependence tests can be formulated using a | |
639 | subset of the Presburger arithmetics that can be translated to linear | |
640 | constraint systems. These linear constraint systems can then be | |
641 | solved using the Omega solver. | |
642 | ||
643 | The Omega solver is using Fourier-Motzkin's algorithm for variable | |
644 | elimination: a linear constraint system containing @code{n} variables | |
645 | is reduced to a linear constraint system with @code{n-1} variables. | |
646 | The Omega solver can also be used for solving other problems that can | |
647 | be expressed under the form of a system of linear equalities and | |
648 | inequalities. The Omega solver is known to have an exponential worst | |
649 | case, also known under the name of ``omega nightmare'' in the | |
650 | literature, but in practice, the omega test is known to be efficient | |
651 | for the common data dependence tests. | |
652 | ||
653 | The interface used by the Omega solver for describing the linear | |
654 | programming problems is described in @file{omega.h}, and the solver is | |
655 | @code{omega_solve_problem}. |