]> git.ipfire.org Git - thirdparty/gcc.git/blob - gcc/doc/gccint/passes-and-files-of-the-compiler/rtl-passes.rst
sphinx: add missing trailing newline
[thirdparty/gcc.git] / gcc / doc / gccint / passes-and-files-of-the-compiler / rtl-passes.rst
1 ..
2 Copyright 1988-2022 Free Software Foundation, Inc.
3 This is part of the GCC manual.
4 For copying conditions, see the copyright.rst file.
5
6 .. _rtl-passes:
7
8 RTL passes
9 **********
10
11 The following briefly describes the RTL generation and optimization
12 passes that are run after the Tree optimization passes.
13
14 * RTL generation
15
16 .. Avoiding overfull is tricky here.
17
18 The source files for RTL generation include
19 :samp:`stmt.cc`,
20 :samp:`calls.cc`,
21 :samp:`expr.cc`,
22 :samp:`explow.cc`,
23 :samp:`expmed.cc`,
24 :samp:`function.cc`,
25 :samp:`optabs.cc`
26 and :samp:`emit-rtl.cc`.
27 Also, the file
28 :samp:`insn-emit.cc`, generated from the machine description by the
29 program ``genemit``, is used in this pass. The header file
30 :samp:`expr.h` is used for communication within this pass.
31
32 .. index:: genflags, gencodes
33
34 The header files :samp:`insn-flags.h` and :samp:`insn-codes.h`,
35 generated from the machine description by the programs ``genflags``
36 and ``gencodes``, tell this pass which standard names are available
37 for use and which patterns correspond to them.
38
39 * Generation of exception landing pads
40
41 This pass generates the glue that handles communication between the
42 exception handling library routines and the exception handlers within
43 the function. Entry points in the function that are invoked by the
44 exception handling library are called :dfn:`landing pads`. The code
45 for this pass is located in :samp:`except.cc`.
46
47 * Control flow graph cleanup
48
49 This pass removes unreachable code, simplifies jumps to next, jumps to
50 jump, jumps across jumps, etc. The pass is run multiple times.
51 For historical reasons, it is occasionally referred to as the 'jump
52 optimization pass'. The bulk of the code for this pass is in
53 :samp:`cfgcleanup.cc`, and there are support routines in :samp:`cfgrtl.cc`
54 and :samp:`jump.cc`.
55
56 * Forward propagation of single-def values
57
58 This pass attempts to remove redundant computation by substituting
59 variables that come from a single definition, and
60 seeing if the result can be simplified. It performs copy propagation
61 and addressing mode selection. The pass is run twice, with values
62 being propagated into loops only on the second run. The code is
63 located in :samp:`fwprop.cc`.
64
65 * Common subexpression elimination
66
67 This pass removes redundant computation within basic blocks, and
68 optimizes addressing modes based on cost. The pass is run twice.
69 The code for this pass is located in :samp:`cse.cc`.
70
71 * Global common subexpression elimination
72
73 This pass performs two
74 different types of GCSE depending on whether you are optimizing for
75 size or not (LCM based GCSE tends to increase code size for a gain in
76 speed, while Morel-Renvoise based GCSE does not).
77 When optimizing for size, GCSE is done using Morel-Renvoise Partial
78 Redundancy Elimination, with the exception that it does not try to move
79 invariants out of loops---that is left to the loop optimization pass.
80 If MR PRE GCSE is done, code hoisting (aka unification) is also done, as
81 well as load motion.
82 If you are optimizing for speed, LCM (lazy code motion) based GCSE is
83 done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM
84 based GCSE also does loop invariant code motion. We also perform load
85 and store motion when optimizing for speed.
86 Regardless of which type of GCSE is used, the GCSE pass also performs
87 global constant and copy propagation.
88 The source file for this pass is :samp:`gcse.cc`, and the LCM routines
89 are in :samp:`lcm.cc`.
90
91 * Loop optimization
92
93 This pass performs several loop related optimizations.
94 The source files :samp:`cfgloopanal.cc` and :samp:`cfgloopmanip.cc` contain
95 generic loop analysis and manipulation code. Initialization and finalization
96 of loop structures is handled by :samp:`loop-init.cc`.
97 A loop invariant motion pass is implemented in :samp:`loop-invariant.cc`.
98 Basic block level optimizations---unrolling, and peeling loops---
99 are implemented in :samp:`loop-unroll.cc`.
100 Replacing of the exit condition of loops by special machine-dependent
101 instructions is handled by :samp:`loop-doloop.cc`.
102
103 * Jump bypassing
104
105 This pass is an aggressive form of GCSE that transforms the control
106 flow graph of a function by propagating constants into conditional
107 branch instructions. The source file for this pass is :samp:`gcse.cc`.
108
109 * If conversion
110
111 This pass attempts to replace conditional branches and surrounding
112 assignments with arithmetic, boolean value producing comparison
113 instructions, and conditional move instructions. In the very last
114 invocation after reload/LRA, it will generate predicated instructions
115 when supported by the target. The code is located in :samp:`ifcvt.cc`.
116
117 * Web construction
118
119 This pass splits independent uses of each pseudo-register. This can
120 improve effect of the other transformation, such as CSE or register
121 allocation. The code for this pass is located in :samp:`web.cc`.
122
123 * Instruction combination
124
125 This pass attempts to combine groups of two or three instructions that
126 are related by data flow into single instructions. It combines the
127 RTL expressions for the instructions by substitution, simplifies the
128 result using algebra, and then attempts to match the result against
129 the machine description. The code is located in :samp:`combine.cc`.
130
131 * Mode switching optimization
132
133 This pass looks for instructions that require the processor to be in a
134 specific 'mode' and minimizes the number of mode changes required to
135 satisfy all users. What these modes are, and what they apply to are
136 completely target-specific. The code for this pass is located in
137 :samp:`mode-switching.cc`.
138
139 .. index:: modulo scheduling, sms, swing, software pipelining
140
141 * Modulo scheduling
142
143 This pass looks at innermost loops and reorders their instructions
144 by overlapping different iterations. Modulo scheduling is performed
145 immediately before instruction scheduling. The code for this pass is
146 located in :samp:`modulo-sched.cc`.
147
148 * Instruction scheduling
149
150 This pass looks for instructions whose output will not be available by
151 the time that it is used in subsequent instructions. Memory loads and
152 floating point instructions often have this behavior on RISC machines.
153 It re-orders instructions within a basic block to try to separate the
154 definition and use of items that otherwise would cause pipeline
155 stalls. This pass is performed twice, before and after register
156 allocation. The code for this pass is located in :samp:`haifa-sched.cc`,
157 :samp:`sched-deps.cc`, :samp:`sched-ebb.cc`, :samp:`sched-rgn.cc` and
158 :samp:`sched-vis.c`.
159
160 * Register allocation
161
162 These passes make sure that all occurrences of pseudo registers are
163 eliminated, either by allocating them to a hard register, replacing
164 them by an equivalent expression (e.g. a constant) or by placing
165 them on the stack. This is done in several subpasses:
166
167 * The integrated register allocator (IRA). It is called
168 integrated because coalescing, register live range splitting, and hard
169 register preferencing are done on-the-fly during coloring. It also
170 has better integration with the reload/LRA pass. Pseudo-registers spilled
171 by the allocator or the reload/LRA have still a chance to get
172 hard-registers if the reload/LRA evicts some pseudo-registers from
173 hard-registers. The allocator helps to choose better pseudos for
174 spilling based on their live ranges and to coalesce stack slots
175 allocated for the spilled pseudo-registers. IRA is a regional
176 register allocator which is transformed into Chaitin-Briggs allocator
177 if there is one region. By default, IRA chooses regions using
178 register pressure but the user can force it to use one region or
179 regions corresponding to all loops.
180
181 Source files of the allocator are :samp:`ira.cc`, :samp:`ira-build.cc`,
182 :samp:`ira-costs.cc`, :samp:`ira-conflicts.cc`, :samp:`ira-color.cc`,
183 :samp:`ira-emit.cc`, :samp:`ira-lives`, plus header files :samp:`ira.h`
184 and :samp:`ira-int.h` used for the communication between the allocator
185 and the rest of the compiler and between the IRA files.
186
187 .. index:: reloading
188
189 * Reloading. This pass renumbers pseudo registers with the hardware
190 registers numbers they were allocated. Pseudo registers that did not
191 get hard registers are replaced with stack slots. Then it finds
192 instructions that are invalid because a value has failed to end up in
193 a register, or has ended up in a register of the wrong kind. It fixes
194 up these instructions by reloading the problematical values
195 temporarily into registers. Additional instructions are generated to
196 do the copying.
197
198 The reload pass also optionally eliminates the frame pointer and inserts
199 instructions to save and restore call-clobbered registers around calls.
200
201 Source files are :samp:`reload.cc` and :samp:`reload1.cc`, plus the header
202 :samp:`reload.h` used for communication between them.
203
204 .. index:: Local Register Allocator (LRA)
205
206 * This pass is a modern replacement of the reload pass. Source files
207 are :samp:`lra.cc`, :samp:`lra-assign.c`, :samp:`lra-coalesce.cc`,
208 :samp:`lra-constraints.cc`, :samp:`lra-eliminations.cc`,
209 :samp:`lra-lives.cc`, :samp:`lra-remat.cc`, :samp:`lra-spills.cc`, the
210 header :samp:`lra-int.h` used for communication between them, and the
211 header :samp:`lra.h` used for communication between LRA and the rest of
212 compiler.
213
214 Unlike the reload pass, intermediate LRA decisions are reflected in
215 RTL as much as possible. This reduces the number of target-dependent
216 macros and hooks, leaving instruction constraints as the primary
217 source of control.
218
219 LRA is run on targets for which TARGET_LRA_P returns true.
220
221 * Basic block reordering
222
223 This pass implements profile guided code positioning. If profile
224 information is not available, various types of static analysis are
225 performed to make the predictions normally coming from the profile
226 feedback (IE execution frequency, branch probability, etc). It is
227 implemented in the file :samp:`bb-reorder.cc`, and the various
228 prediction routines are in :samp:`predict.cc`.
229
230 * Variable tracking
231
232 This pass computes where the variables are stored at each
233 position in code and generates notes describing the variable locations
234 to RTL code. The location lists are then generated according to these
235 notes to debug information if the debugging information format supports
236 location lists. The code is located in :samp:`var-tracking.cc`.
237
238 * Delayed branch scheduling
239
240 This optional pass attempts to find instructions that can go into the
241 delay slots of other instructions, usually jumps and calls. The code
242 for this pass is located in :samp:`reorg.cc`.
243
244 * Branch shortening
245
246 On many RISC machines, branch instructions have a limited range.
247 Thus, longer sequences of instructions must be used for long branches.
248 In this pass, the compiler figures out what how far each instruction
249 will be from each other instruction, and therefore whether the usual
250 instructions, or the longer sequences, must be used for each branch.
251 The code for this pass is located in :samp:`final.cc`.
252
253 * Register-to-stack conversion
254
255 Conversion from usage of some hard registers to usage of a register
256 stack may be done at this point. Currently, this is supported only
257 for the floating-point registers of the Intel 80387 coprocessor. The
258 code for this pass is located in :samp:`reg-stack.cc`.
259
260 * Final
261
262 This pass outputs the assembler code for the function. The source files
263 are :samp:`final.cc` plus :samp:`insn-output.cc`; the latter is generated
264 automatically from the machine description by the tool :samp:`genoutput`.
265 The header file :samp:`conditions.h` is used for communication between
266 these files.
267
268 * Debugging information output
269
270 This is run after final because it must output the stack slot offsets
271 for pseudo registers that did not get hard registers. Source files
272 are :samp:`dwarfout.c` for
273 DWARF symbol table format, files :samp:`dwarf2out.cc` and :samp:`dwarf2asm.cc`
274 for DWARF2 symbol table format, and :samp:`vmsdbgout.cc` for VMS debug
275 symbol table format.