]>
Commit | Line | Data |
---|---|---|
c63539ff ML |
1 | .. |
2 | Copyright 1988-2022 Free Software Foundation, Inc. | |
3 | This is part of the GCC manual. | |
4 | For copying conditions, see the copyright.rst file. | |
5 | ||
6 | .. index:: peephole optimizer definitions, defining peephole optimizers | |
7 | ||
8 | .. _peephole-definitions: | |
9 | ||
10 | Machine-Specific Peephole Optimizers | |
11 | ************************************ | |
12 | ||
13 | In addition to instruction patterns the :samp:`md` file may contain | |
14 | definitions of machine-specific peephole optimizations. | |
15 | ||
16 | The combiner does not notice certain peephole optimizations when the data | |
17 | flow in the program does not suggest that it should try them. For example, | |
18 | sometimes two consecutive insns related in purpose can be combined even | |
19 | though the second one does not appear to use a register computed in the | |
20 | first one. A machine-specific peephole optimizer can detect such | |
21 | opportunities. | |
22 | ||
23 | There are two forms of peephole definitions that may be used. The | |
24 | original ``define_peephole`` is run at assembly output time to | |
25 | match insns and substitute assembly text. Use of ``define_peephole`` | |
26 | is deprecated. | |
27 | ||
28 | A newer ``define_peephole2`` matches insns and substitutes new | |
29 | insns. The ``peephole2`` pass is run after register allocation | |
30 | but before scheduling, which may result in much better code for | |
31 | targets that do scheduling. | |
32 | ||
33 | .. toctree:: | |
34 | :maxdepth: 2 | |
35 | ||
36 | ||
37 | .. index:: define_peephole | |
38 | ||
39 | .. _define_peephole: | |
40 | ||
41 | RTL to Text Peephole Optimizers | |
42 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
43 | ||
44 | A definition looks like this: | |
45 | ||
46 | .. code-block:: | |
47 | ||
48 | (define_peephole | |
49 | [insn-pattern-1 | |
50 | insn-pattern-2 | |
51 | ...] | |
52 | "condition" | |
53 | "template" | |
54 | "optional-insn-attributes") | |
55 | ||
56 | The last string operand may be omitted if you are not using any | |
57 | machine-specific information in this machine description. If present, | |
58 | it must obey the same rules as in a ``define_insn``. | |
59 | ||
60 | In this skeleton, :samp:`{insn-pattern-1}` and so on are patterns to match | |
61 | consecutive insns. The optimization applies to a sequence of insns when | |
62 | :samp:`{insn-pattern-1}` matches the first one, :samp:`{insn-pattern-2}` matches | |
63 | the next, and so on. | |
64 | ||
65 | Each of the insns matched by a peephole must also match a | |
66 | ``define_insn``. Peepholes are checked only at the last stage just | |
67 | before code generation, and only optionally. Therefore, any insn which | |
68 | would match a peephole but no ``define_insn`` will cause a crash in code | |
69 | generation in an unoptimized compilation, or at various optimization | |
70 | stages. | |
71 | ||
72 | The operands of the insns are matched with ``match_operands``, | |
73 | ``match_operator``, and ``match_dup``, as usual. What is not | |
74 | usual is that the operand numbers apply to all the insn patterns in the | |
75 | definition. So, you can check for identical operands in two insns by | |
76 | using ``match_operand`` in one insn and ``match_dup`` in the | |
77 | other. | |
78 | ||
79 | The operand constraints used in ``match_operand`` patterns do not have | |
80 | any direct effect on the applicability of the peephole, but they will | |
81 | be validated afterward, so make sure your constraints are general enough | |
82 | to apply whenever the peephole matches. If the peephole matches | |
83 | but the constraints are not satisfied, the compiler will crash. | |
84 | ||
85 | It is safe to omit constraints in all the operands of the peephole; or | |
86 | you can write constraints which serve as a double-check on the criteria | |
87 | previously tested. | |
88 | ||
89 | Once a sequence of insns matches the patterns, the :samp:`{condition}` is | |
90 | checked. This is a C expression which makes the final decision whether to | |
91 | perform the optimization (we do so if the expression is nonzero). If | |
92 | :samp:`{condition}` is omitted (in other words, the string is empty) then the | |
93 | optimization is applied to every sequence of insns that matches the | |
94 | patterns. | |
95 | ||
96 | The defined peephole optimizations are applied after register allocation | |
97 | is complete. Therefore, the peephole definition can check which | |
98 | operands have ended up in which kinds of registers, just by looking at | |
99 | the operands. | |
100 | ||
101 | .. index:: prev_active_insn | |
102 | ||
103 | The way to refer to the operands in :samp:`{condition}` is to write | |
104 | ``operands[i]`` for operand number :samp:`{i}` (as matched by | |
105 | ``(match_operand i ...)``). Use the variable ``insn`` | |
106 | to refer to the last of the insns being matched; use | |
107 | ``prev_active_insn`` to find the preceding insns. | |
108 | ||
109 | .. index:: dead_or_set_p | |
110 | ||
111 | When optimizing computations with intermediate results, you can use | |
112 | :samp:`{condition}` to match only when the intermediate results are not used | |
113 | elsewhere. Use the C expression ``dead_or_set_p (insn, | |
114 | op)``, where :samp:`{insn}` is the insn in which you expect the value | |
115 | to be used for the last time (from the value of ``insn``, together | |
116 | with use of ``prev_nonnote_insn``), and :samp:`{op}` is the intermediate | |
117 | value (from ``operands[i]``). | |
118 | ||
119 | Applying the optimization means replacing the sequence of insns with one | |
120 | new insn. The :samp:`{template}` controls ultimate output of assembler code | |
121 | for this combined insn. It works exactly like the template of a | |
122 | ``define_insn``. Operand numbers in this template are the same ones | |
123 | used in matching the original sequence of insns. | |
124 | ||
125 | The result of a defined peephole optimizer does not need to match any of | |
126 | the insn patterns in the machine description; it does not even have an | |
127 | opportunity to match them. The peephole optimizer definition itself serves | |
128 | as the insn pattern to control how the insn is output. | |
129 | ||
130 | Defined peephole optimizers are run as assembler code is being output, | |
131 | so the insns they produce are never combined or rearranged in any way. | |
132 | ||
133 | Here is an example, taken from the 68000 machine description: | |
134 | ||
135 | .. code-block:: | |
136 | ||
137 | (define_peephole | |
138 | [(set (reg:SI 15) (plus:SI (reg:SI 15) (const_int 4))) | |
139 | (set (match_operand:DF 0 "register_operand" "=f") | |
140 | (match_operand:DF 1 "register_operand" "ad"))] | |
141 | "FP_REG_P (operands[0]) && ! FP_REG_P (operands[1])" | |
142 | { | |
143 | rtx xoperands[2]; | |
144 | xoperands[1] = gen_rtx_REG (SImode, REGNO (operands[1]) + 1); | |
145 | #ifdef MOTOROLA | |
146 | output_asm_insn ("move.l %1,(sp)", xoperands); | |
147 | output_asm_insn ("move.l %1,-(sp)", operands); | |
148 | return "fmove.d (sp)+,%0"; | |
149 | #else | |
150 | output_asm_insn ("movel %1,sp@", xoperands); | |
151 | output_asm_insn ("movel %1,sp@-", operands); | |
152 | return "fmoved sp@+,%0"; | |
153 | #endif | |
154 | }) | |
155 | ||
156 | The effect of this optimization is to change | |
157 | ||
158 | .. code-block:: | |
159 | ||
160 | jbsr _foobar | |
161 | addql #4,sp | |
162 | movel d1,sp@- | |
163 | movel d0,sp@- | |
164 | fmoved sp@+,fp0 | |
165 | ||
166 | into | |
167 | ||
168 | .. code-block:: | |
169 | ||
170 | jbsr _foobar | |
171 | movel d1,sp@ | |
172 | movel d0,sp@- | |
173 | fmoved sp@+,fp0 | |
174 | ||
175 | If a peephole matches a sequence including one or more jump insns, you must | |
176 | take account of the flags such as ``CC_REVERSED`` which specify that the | |
177 | condition codes are represented in an unusual manner. The compiler | |
178 | automatically alters any ordinary conditional jumps which occur in such | |
179 | situations, but the compiler cannot alter jumps which have been replaced by | |
180 | peephole optimizations. So it is up to you to alter the assembler code | |
181 | that the peephole produces. Supply C code to write the assembler output, | |
182 | and in this C code check the condition code status flags and change the | |
183 | assembler code as appropriate. | |
184 | :samp:`{insn-pattern-1}` and so on look *almost* like the second | |
185 | operand of ``define_insn``. There is one important difference: the | |
186 | second operand of ``define_insn`` consists of one or more RTX's | |
187 | enclosed in square brackets. Usually, there is only one: then the same | |
188 | action can be written as an element of a ``define_peephole``. But | |
189 | when there are multiple actions in a ``define_insn``, they are | |
190 | implicitly enclosed in a ``parallel``. Then you must explicitly | |
191 | write the ``parallel``, and the square brackets within it, in the | |
192 | ``define_peephole``. Thus, if an insn pattern looks like this, | |
193 | ||
194 | .. code-block:: | |
195 | ||
196 | (define_insn "divmodsi4" | |
197 | [(set (match_operand:SI 0 "general_operand" "=d") | |
198 | (div:SI (match_operand:SI 1 "general_operand" "0") | |
199 | (match_operand:SI 2 "general_operand" "dmsK"))) | |
200 | (set (match_operand:SI 3 "general_operand" "=d") | |
201 | (mod:SI (match_dup 1) (match_dup 2)))] | |
202 | "TARGET_68020" | |
203 | "divsl%.l %2,%3:%0") | |
204 | ||
205 | then the way to mention this insn in a peephole is as follows: | |
206 | ||
207 | .. code-block:: | |
208 | ||
209 | (define_peephole | |
210 | [... | |
211 | (parallel | |
212 | [(set (match_operand:SI 0 "general_operand" "=d") | |
213 | (div:SI (match_operand:SI 1 "general_operand" "0") | |
214 | (match_operand:SI 2 "general_operand" "dmsK"))) | |
215 | (set (match_operand:SI 3 "general_operand" "=d") | |
216 | (mod:SI (match_dup 1) (match_dup 2)))]) | |
217 | ...] | |
218 | ...) | |
219 | ||
220 | .. index:: define_peephole2 | |
221 | ||
222 | .. _define_peephole2: | |
223 | ||
224 | RTL to RTL Peephole Optimizers | |
225 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
226 | ||
227 | The ``define_peephole2`` definition tells the compiler how to | |
228 | substitute one sequence of instructions for another sequence, | |
229 | what additional scratch registers may be needed and what their | |
230 | lifetimes must be. | |
231 | ||
232 | .. code-block:: | |
233 | ||
234 | (define_peephole2 | |
235 | [insn-pattern-1 | |
236 | insn-pattern-2 | |
237 | ...] | |
238 | "condition" | |
239 | [new-insn-pattern-1 | |
240 | new-insn-pattern-2 | |
241 | ...] | |
242 | "preparation-statements") | |
243 | ||
244 | The definition is almost identical to ``define_split`` | |
245 | (see :ref:`insn-splitting`) except that the pattern to match is not a | |
246 | single instruction, but a sequence of instructions. | |
247 | ||
248 | It is possible to request additional scratch registers for use in the | |
249 | output template. If appropriate registers are not free, the pattern | |
250 | will simply not match. | |
251 | ||
252 | .. index:: match_scratch, match_dup | |
253 | ||
254 | Scratch registers are requested with a ``match_scratch`` pattern at | |
255 | the top level of the input pattern. The allocated register (initially) will | |
256 | be dead at the point requested within the original sequence. If the scratch | |
257 | is used at more than a single point, a ``match_dup`` pattern at the | |
258 | top level of the input pattern marks the last position in the input sequence | |
259 | at which the register must be available. | |
260 | ||
261 | Here is an example from the IA-32 machine description: | |
262 | ||
263 | .. code-block:: | |
264 | ||
265 | (define_peephole2 | |
266 | [(match_scratch:SI 2 "r") | |
267 | (parallel [(set (match_operand:SI 0 "register_operand" "") | |
268 | (match_operator:SI 3 "arith_or_logical_operator" | |
269 | [(match_dup 0) | |
270 | (match_operand:SI 1 "memory_operand" "")])) | |
271 | (clobber (reg:CC 17))])] | |
272 | "! optimize_size && ! TARGET_READ_MODIFY" | |
273 | [(set (match_dup 2) (match_dup 1)) | |
274 | (parallel [(set (match_dup 0) | |
275 | (match_op_dup 3 [(match_dup 0) (match_dup 2)])) | |
276 | (clobber (reg:CC 17))])] | |
277 | "") | |
278 | ||
279 | This pattern tries to split a load from its use in the hopes that we'll be | |
280 | able to schedule around the memory load latency. It allocates a single | |
281 | ``SImode`` register of class ``GENERAL_REGS`` (``"r"``) that needs | |
282 | to be live only at the point just before the arithmetic. | |
283 | ||
284 | A real example requiring extended scratch lifetimes is harder to come by, | |
285 | so here's a silly made-up example: | |
286 | ||
287 | .. code-block:: | |
288 | ||
289 | (define_peephole2 | |
290 | [(match_scratch:SI 4 "r") | |
291 | (set (match_operand:SI 0 "" "") (match_operand:SI 1 "" "")) | |
292 | (set (match_operand:SI 2 "" "") (match_dup 1)) | |
293 | (match_dup 4) | |
294 | (set (match_operand:SI 3 "" "") (match_dup 1))] | |
295 | "/* determine 1 does not overlap 0 and 2 */" | |
296 | [(set (match_dup 4) (match_dup 1)) | |
297 | (set (match_dup 0) (match_dup 4)) | |
298 | (set (match_dup 2) (match_dup 4)) | |
299 | (set (match_dup 3) (match_dup 4))] | |
300 | "") | |
301 | ||
302 | There are two special macros defined for use in the preparation statements: | |
303 | ``DONE`` and ``FAIL``. Use them with a following semicolon, | |
304 | as a statement. | |
305 | ||
306 | .. index:: DONE | |
307 | ||
308 | .. envvar:: DONE | |
309 | ||
310 | Use the ``DONE`` macro to end RTL generation for the peephole. The | |
311 | only RTL insns generated as replacement for the matched input insn will | |
312 | be those already emitted by explicit calls to ``emit_insn`` within | |
313 | the preparation statements; the replacement pattern is not used. | |
314 | ||
315 | .. envvar:: FAIL | |
316 | ||
317 | Make the ``define_peephole2`` fail on this occasion. When a ``define_peephole2`` | |
318 | fails, it means that the replacement was not truly available for the | |
319 | particular inputs it was given. In that case, GCC may still apply a | |
320 | later ``define_peephole2`` that also matches the given insn pattern. | |
321 | (Note that this is different from ``define_split``, where ``FAIL`` | |
322 | prevents the input insn from being split at all.) | |
323 | ||
324 | If the preparation falls through (invokes neither ``DONE`` nor | |
325 | ``FAIL``), then the ``define_peephole2`` uses the replacement | |
326 | template. | |
327 | ||
328 | If we had not added the ``(match_dup 4)`` in the middle of the input | |
329 | sequence, it might have been the case that the register we chose at the | |
3ed1b4ce | 330 | beginning of the sequence is killed by the first or second ``set``. |