gcc/doc/gccint/passes-and-files-of-the-compiler/rtl-passes.rst

   1 ..
   2   Copyright 1988-2022 Free Software Foundation, Inc.
   3   This is part of the GCC manual.
   4   For copying conditions, see the copyright.rst file.
   5
   6 .. _rtl-passes:
   7
   8 RTL passes
   9 **********
  10
  11 The following briefly describes the RTL generation and optimization
  12 passes that are run after the Tree optimization passes.
  13
  14 * RTL generation
  15
  16   .. Avoiding overfull is tricky here.
  17
  18   The source files for RTL generation include
  19   :samp:`stmt.cc`,
  20   :samp:`calls.cc`,
  21   :samp:`expr.cc`,
  22   :samp:`explow.cc`,
  23   :samp:`expmed.cc`,
  24   :samp:`function.cc`,
  25   :samp:`optabs.cc`
  26   and :samp:`emit-rtl.cc`.
  27   Also, the file
  28   :samp:`insn-emit.cc`, generated from the machine description by the
  29   program ``genemit``, is used in this pass.  The header file
  30   :samp:`expr.h` is used for communication within this pass.
  31
  32   .. index:: genflags, gencodes
  33
  34   The header files :samp:`insn-flags.h` and :samp:`insn-codes.h`,
  35   generated from the machine description by the programs ``genflags``
  36   and ``gencodes``, tell this pass which standard names are available
  37   for use and which patterns correspond to them.
  38
  39 * Generation of exception landing pads
  40
  41   This pass generates the glue that handles communication between the
  42   exception handling library routines and the exception handlers within
  43   the function.  Entry points in the function that are invoked by the
  44   exception handling library are called :dfn:`landing pads`.  The code
  45   for this pass is located in :samp:`except.cc`.
  46
  47 * Control flow graph cleanup
  48
  49   This pass removes unreachable code, simplifies jumps to next, jumps to
  50   jump, jumps across jumps, etc.  The pass is run multiple times.
  51   For historical reasons, it is occasionally referred to as the 'jump
  52   optimization pass'.  The bulk of the code for this pass is in
  53   :samp:`cfgcleanup.cc`, and there are support routines in :samp:`cfgrtl.cc`
  54   and :samp:`jump.cc`.
  55
  56 * Forward propagation of single-def values
  57
  58   This pass attempts to remove redundant computation by substituting
  59   variables that come from a single definition, and
  60   seeing if the result can be simplified.  It performs copy propagation
  61   and addressing mode selection.  The pass is run twice, with values
  62   being propagated into loops only on the second run.  The code is
  63   located in :samp:`fwprop.cc`.
  64
  65 * Common subexpression elimination
  66
  67   This pass removes redundant computation within basic blocks, and
  68   optimizes addressing modes based on cost.  The pass is run twice.
  69   The code for this pass is located in :samp:`cse.cc`.
  70
  71 * Global common subexpression elimination
  72
  73   This pass performs two
  74   different types of GCSE  depending on whether you are optimizing for
  75   size or not (LCM based GCSE tends to increase code size for a gain in
  76   speed, while Morel-Renvoise based GCSE does not).
  77   When optimizing for size, GCSE is done using Morel-Renvoise Partial
  78   Redundancy Elimination, with the exception that it does not try to move
  79   invariants out of loops---that is left to  the loop optimization pass.
  80   If MR PRE GCSE is done, code hoisting (aka unification) is also done, as
  81   well as load motion.
  82   If you are optimizing for speed, LCM (lazy code motion) based GCSE is
  83   done.  LCM is based on the work of Knoop, Ruthing, and Steffen.  LCM
  84   based GCSE also does loop invariant code motion.  We also perform load
  85   and store motion when optimizing for speed.
  86   Regardless of which type of GCSE is used, the GCSE pass also performs
  87   global constant and  copy propagation.
  88   The source file for this pass is :samp:`gcse.cc`, and the LCM routines
  89   are in :samp:`lcm.cc`.
  90
  91 * Loop optimization
  92
  93   This pass performs several loop related optimizations.
  94   The source files :samp:`cfgloopanal.cc` and :samp:`cfgloopmanip.cc` contain
  95   generic loop analysis and manipulation code.  Initialization and finalization
  96   of loop structures is handled by :samp:`loop-init.cc`.
  97   A loop invariant motion pass is implemented in :samp:`loop-invariant.cc`.
  98   Basic block level optimizations---unrolling, and peeling loops---
  99   are implemented in :samp:`loop-unroll.cc`.
 100   Replacing of the exit condition of loops by special machine-dependent
 101   instructions is handled by :samp:`loop-doloop.cc`.
 102
 103 * Jump bypassing
 104
 105   This pass is an aggressive form of GCSE that transforms the control
 106   flow graph of a function by propagating constants into conditional
 107   branch instructions.  The source file for this pass is :samp:`gcse.cc`.
 108
 109 * If conversion
 110
 111   This pass attempts to replace conditional branches and surrounding
 112   assignments with arithmetic, boolean value producing comparison
 113   instructions, and conditional move instructions.  In the very last
 114   invocation after reload/LRA, it will generate predicated instructions
 115   when supported by the target.  The code is located in :samp:`ifcvt.cc`.
 116
 117 * Web construction
 118
 119   This pass splits independent uses of each pseudo-register.  This can
 120   improve effect of the other transformation, such as CSE or register
 121   allocation.  The code for this pass is located in :samp:`web.cc`.
 122
 123 * Instruction combination
 124
 125   This pass attempts to combine groups of two or three instructions that
 126   are related by data flow into single instructions.  It combines the
 127   RTL expressions for the instructions by substitution, simplifies the
 128   result using algebra, and then attempts to match the result against
 129   the machine description.  The code is located in :samp:`combine.cc`.
 130
 131 * Mode switching optimization
 132
 133   This pass looks for instructions that require the processor to be in a
 134   specific 'mode' and minimizes the number of mode changes required to
 135   satisfy all users.  What these modes are, and what they apply to are
 136   completely target-specific.  The code for this pass is located in
 137   :samp:`mode-switching.cc`.
 138
 139   .. index:: modulo scheduling, sms, swing, software pipelining
 140
 141 * Modulo scheduling
 142
 143   This pass looks at innermost loops and reorders their instructions
 144   by overlapping different iterations.  Modulo scheduling is performed
 145   immediately before instruction scheduling.  The code for this pass is
 146   located in :samp:`modulo-sched.cc`.
 147
 148 * Instruction scheduling
 149
 150   This pass looks for instructions whose output will not be available by
 151   the time that it is used in subsequent instructions.  Memory loads and
 152   floating point instructions often have this behavior on RISC machines.
 153   It re-orders instructions within a basic block to try to separate the
 154   definition and use of items that otherwise would cause pipeline
 155   stalls.  This pass is performed twice, before and after register
 156   allocation.  The code for this pass is located in :samp:`haifa-sched.cc`,
 157   :samp:`sched-deps.cc`, :samp:`sched-ebb.cc`, :samp:`sched-rgn.cc` and
 158   :samp:`sched-vis.c`.
 159
 160 * Register allocation
 161
 162   These passes make sure that all occurrences of pseudo registers are
 163   eliminated, either by allocating them to a hard register, replacing
 164   them by an equivalent expression (e.g. a constant) or by placing
 165   them on the stack.  This is done in several subpasses:
 166
 167   * The integrated register allocator (IRA).  It is called
 168     integrated because coalescing, register live range splitting, and hard
 169     register preferencing are done on-the-fly during coloring.  It also
 170     has better integration with the reload/LRA pass.  Pseudo-registers spilled
 171     by the allocator or the reload/LRA have still a chance to get
 172     hard-registers if the reload/LRA evicts some pseudo-registers from
 173     hard-registers.  The allocator helps to choose better pseudos for
 174     spilling based on their live ranges and to coalesce stack slots
 175     allocated for the spilled pseudo-registers.  IRA is a regional
 176     register allocator which is transformed into Chaitin-Briggs allocator
 177     if there is one region.  By default, IRA chooses regions using
 178     register pressure but the user can force it to use one region or
 179     regions corresponding to all loops.
 180
 181     Source files of the allocator are :samp:`ira.cc`, :samp:`ira-build.cc`,
 182     :samp:`ira-costs.cc`, :samp:`ira-conflicts.cc`, :samp:`ira-color.cc`,
 183     :samp:`ira-emit.cc`, :samp:`ira-lives`, plus header files :samp:`ira.h`
 184     and :samp:`ira-int.h` used for the communication between the allocator
 185     and the rest of the compiler and between the IRA files.
 186
 187     .. index:: reloading
 188
 189   * Reloading.  This pass renumbers pseudo registers with the hardware
 190     registers numbers they were allocated.  Pseudo registers that did not
 191     get hard registers are replaced with stack slots.  Then it finds
 192     instructions that are invalid because a value has failed to end up in
 193     a register, or has ended up in a register of the wrong kind.  It fixes
 194     up these instructions by reloading the problematical values
 195     temporarily into registers.  Additional instructions are generated to
 196     do the copying.
 197
 198     The reload pass also optionally eliminates the frame pointer and inserts
 199     instructions to save and restore call-clobbered registers around calls.
 200
 201     Source files are :samp:`reload.cc` and :samp:`reload1.cc`, plus the header
 202     :samp:`reload.h` used for communication between them.
 203
 204     .. index:: Local Register Allocator (LRA)
 205
 206   * This pass is a modern replacement of the reload pass.  Source files
 207     are :samp:`lra.cc`, :samp:`lra-assign.c`, :samp:`lra-coalesce.cc`,
 208     :samp:`lra-constraints.cc`, :samp:`lra-eliminations.cc`,
 209     :samp:`lra-lives.cc`, :samp:`lra-remat.cc`, :samp:`lra-spills.cc`, the
 210     header :samp:`lra-int.h` used for communication between them, and the
 211     header :samp:`lra.h` used for communication between LRA and the rest of
 212     compiler.
 213
 214     Unlike the reload pass, intermediate LRA decisions are reflected in
 215     RTL as much as possible.  This reduces the number of target-dependent
 216     macros and hooks, leaving instruction constraints as the primary
 217     source of control.
 218
 219     LRA is run on targets for which TARGET_LRA_P returns true.
 220
 221 * Basic block reordering
 222
 223   This pass implements profile guided code positioning.  If profile
 224   information is not available, various types of static analysis are
 225   performed to make the predictions normally coming from the profile
 226   feedback (IE execution frequency, branch probability, etc).  It is
 227   implemented in the file :samp:`bb-reorder.cc`, and the various
 228   prediction routines are in :samp:`predict.cc`.
 229
 230 * Variable tracking
 231
 232   This pass computes where the variables are stored at each
 233   position in code and generates notes describing the variable locations
 234   to RTL code.  The location lists are then generated according to these
 235   notes to debug information if the debugging information format supports
 236   location lists.  The code is located in :samp:`var-tracking.cc`.
 237
 238 * Delayed branch scheduling
 239
 240   This optional pass attempts to find instructions that can go into the
 241   delay slots of other instructions, usually jumps and calls.  The code
 242   for this pass is located in :samp:`reorg.cc`.
 243
 244 * Branch shortening
 245
 246   On many RISC machines, branch instructions have a limited range.
 247   Thus, longer sequences of instructions must be used for long branches.
 248   In this pass, the compiler figures out what how far each instruction
 249   will be from each other instruction, and therefore whether the usual
 250   instructions, or the longer sequences, must be used for each branch.
 251   The code for this pass is located in :samp:`final.cc`.
 252
 253 * Register-to-stack conversion
 254
 255   Conversion from usage of some hard registers to usage of a register
 256   stack may be done at this point.  Currently, this is supported only
 257   for the floating-point registers of the Intel 80387 coprocessor.  The
 258   code for this pass is located in :samp:`reg-stack.cc`.
 259
 260 * Final
 261
 262   This pass outputs the assembler code for the function.  The source files
 263   are :samp:`final.cc` plus :samp:`insn-output.cc`; the latter is generated
 264   automatically from the machine description by the tool :samp:`genoutput`.
 265   The header file :samp:`conditions.h` is used for communication between
 266   these files.
 267
 268 * Debugging information output
 269
 270   This is run after final because it must output the stack slot offsets
 271   for pseudo registers that did not get hard registers.  Source files
 272   are :samp:`dwarfout.c` for
 273   DWARF symbol table format, files :samp:`dwarf2out.cc` and :samp:`dwarf2asm.cc`
 274   for DWARF2 symbol table format, and :samp:`vmsdbgout.cc` for VMS debug
 275   symbol table format.