]> git.ipfire.org Git - thirdparty/gcc.git/blame - gcc/doc/gccint/machine-descriptions/machine-specific-peephole-optimizers.rst
sphinx: add missing trailing newline
[thirdparty/gcc.git] / gcc / doc / gccint / machine-descriptions / machine-specific-peephole-optimizers.rst
CommitLineData
c63539ff
ML
1..
2 Copyright 1988-2022 Free Software Foundation, Inc.
3 This is part of the GCC manual.
4 For copying conditions, see the copyright.rst file.
5
6.. index:: peephole optimizer definitions, defining peephole optimizers
7
8.. _peephole-definitions:
9
10Machine-Specific Peephole Optimizers
11************************************
12
13In addition to instruction patterns the :samp:`md` file may contain
14definitions of machine-specific peephole optimizations.
15
16The combiner does not notice certain peephole optimizations when the data
17flow in the program does not suggest that it should try them. For example,
18sometimes two consecutive insns related in purpose can be combined even
19though the second one does not appear to use a register computed in the
20first one. A machine-specific peephole optimizer can detect such
21opportunities.
22
23There are two forms of peephole definitions that may be used. The
24original ``define_peephole`` is run at assembly output time to
25match insns and substitute assembly text. Use of ``define_peephole``
26is deprecated.
27
28A newer ``define_peephole2`` matches insns and substitutes new
29insns. The ``peephole2`` pass is run after register allocation
30but before scheduling, which may result in much better code for
31targets that do scheduling.
32
33.. toctree::
34 :maxdepth: 2
35
36
37.. index:: define_peephole
38
39.. _define_peephole:
40
41RTL to Text Peephole Optimizers
42^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
43
44A definition looks like this:
45
46.. code-block::
47
48 (define_peephole
49 [insn-pattern-1
50 insn-pattern-2
51 ...]
52 "condition"
53 "template"
54 "optional-insn-attributes")
55
56The last string operand may be omitted if you are not using any
57machine-specific information in this machine description. If present,
58it must obey the same rules as in a ``define_insn``.
59
60In this skeleton, :samp:`{insn-pattern-1}` and so on are patterns to match
61consecutive insns. The optimization applies to a sequence of insns when
62:samp:`{insn-pattern-1}` matches the first one, :samp:`{insn-pattern-2}` matches
63the next, and so on.
64
65Each of the insns matched by a peephole must also match a
66``define_insn``. Peepholes are checked only at the last stage just
67before code generation, and only optionally. Therefore, any insn which
68would match a peephole but no ``define_insn`` will cause a crash in code
69generation in an unoptimized compilation, or at various optimization
70stages.
71
72The operands of the insns are matched with ``match_operands``,
73``match_operator``, and ``match_dup``, as usual. What is not
74usual is that the operand numbers apply to all the insn patterns in the
75definition. So, you can check for identical operands in two insns by
76using ``match_operand`` in one insn and ``match_dup`` in the
77other.
78
79The operand constraints used in ``match_operand`` patterns do not have
80any direct effect on the applicability of the peephole, but they will
81be validated afterward, so make sure your constraints are general enough
82to apply whenever the peephole matches. If the peephole matches
83but the constraints are not satisfied, the compiler will crash.
84
85It is safe to omit constraints in all the operands of the peephole; or
86you can write constraints which serve as a double-check on the criteria
87previously tested.
88
89Once a sequence of insns matches the patterns, the :samp:`{condition}` is
90checked. This is a C expression which makes the final decision whether to
91perform the optimization (we do so if the expression is nonzero). If
92:samp:`{condition}` is omitted (in other words, the string is empty) then the
93optimization is applied to every sequence of insns that matches the
94patterns.
95
96The defined peephole optimizations are applied after register allocation
97is complete. Therefore, the peephole definition can check which
98operands have ended up in which kinds of registers, just by looking at
99the operands.
100
101.. index:: prev_active_insn
102
103The way to refer to the operands in :samp:`{condition}` is to write
104``operands[i]`` for operand number :samp:`{i}` (as matched by
105``(match_operand i ...)``). Use the variable ``insn``
106to refer to the last of the insns being matched; use
107``prev_active_insn`` to find the preceding insns.
108
109.. index:: dead_or_set_p
110
111When optimizing computations with intermediate results, you can use
112:samp:`{condition}` to match only when the intermediate results are not used
113elsewhere. Use the C expression ``dead_or_set_p (insn,
114op)``, where :samp:`{insn}` is the insn in which you expect the value
115to be used for the last time (from the value of ``insn``, together
116with use of ``prev_nonnote_insn``), and :samp:`{op}` is the intermediate
117value (from ``operands[i]``).
118
119Applying the optimization means replacing the sequence of insns with one
120new insn. The :samp:`{template}` controls ultimate output of assembler code
121for this combined insn. It works exactly like the template of a
122``define_insn``. Operand numbers in this template are the same ones
123used in matching the original sequence of insns.
124
125The result of a defined peephole optimizer does not need to match any of
126the insn patterns in the machine description; it does not even have an
127opportunity to match them. The peephole optimizer definition itself serves
128as the insn pattern to control how the insn is output.
129
130Defined peephole optimizers are run as assembler code is being output,
131so the insns they produce are never combined or rearranged in any way.
132
133Here is an example, taken from the 68000 machine description:
134
135.. code-block::
136
137 (define_peephole
138 [(set (reg:SI 15) (plus:SI (reg:SI 15) (const_int 4)))
139 (set (match_operand:DF 0 "register_operand" "=f")
140 (match_operand:DF 1 "register_operand" "ad"))]
141 "FP_REG_P (operands[0]) && ! FP_REG_P (operands[1])"
142 {
143 rtx xoperands[2];
144 xoperands[1] = gen_rtx_REG (SImode, REGNO (operands[1]) + 1);
145 #ifdef MOTOROLA
146 output_asm_insn ("move.l %1,(sp)", xoperands);
147 output_asm_insn ("move.l %1,-(sp)", operands);
148 return "fmove.d (sp)+,%0";
149 #else
150 output_asm_insn ("movel %1,sp@", xoperands);
151 output_asm_insn ("movel %1,sp@-", operands);
152 return "fmoved sp@+,%0";
153 #endif
154 })
155
156The effect of this optimization is to change
157
158.. code-block::
159
160 jbsr _foobar
161 addql #4,sp
162 movel d1,sp@-
163 movel d0,sp@-
164 fmoved sp@+,fp0
165
166into
167
168.. code-block::
169
170 jbsr _foobar
171 movel d1,sp@
172 movel d0,sp@-
173 fmoved sp@+,fp0
174
175If a peephole matches a sequence including one or more jump insns, you must
176take account of the flags such as ``CC_REVERSED`` which specify that the
177condition codes are represented in an unusual manner. The compiler
178automatically alters any ordinary conditional jumps which occur in such
179situations, but the compiler cannot alter jumps which have been replaced by
180peephole optimizations. So it is up to you to alter the assembler code
181that the peephole produces. Supply C code to write the assembler output,
182and in this C code check the condition code status flags and change the
183assembler code as appropriate.
184:samp:`{insn-pattern-1}` and so on look *almost* like the second
185operand of ``define_insn``. There is one important difference: the
186second operand of ``define_insn`` consists of one or more RTX's
187enclosed in square brackets. Usually, there is only one: then the same
188action can be written as an element of a ``define_peephole``. But
189when there are multiple actions in a ``define_insn``, they are
190implicitly enclosed in a ``parallel``. Then you must explicitly
191write the ``parallel``, and the square brackets within it, in the
192``define_peephole``. Thus, if an insn pattern looks like this,
193
194.. code-block::
195
196 (define_insn "divmodsi4"
197 [(set (match_operand:SI 0 "general_operand" "=d")
198 (div:SI (match_operand:SI 1 "general_operand" "0")
199 (match_operand:SI 2 "general_operand" "dmsK")))
200 (set (match_operand:SI 3 "general_operand" "=d")
201 (mod:SI (match_dup 1) (match_dup 2)))]
202 "TARGET_68020"
203 "divsl%.l %2,%3:%0")
204
205then the way to mention this insn in a peephole is as follows:
206
207.. code-block::
208
209 (define_peephole
210 [...
211 (parallel
212 [(set (match_operand:SI 0 "general_operand" "=d")
213 (div:SI (match_operand:SI 1 "general_operand" "0")
214 (match_operand:SI 2 "general_operand" "dmsK")))
215 (set (match_operand:SI 3 "general_operand" "=d")
216 (mod:SI (match_dup 1) (match_dup 2)))])
217 ...]
218 ...)
219
220.. index:: define_peephole2
221
222.. _define_peephole2:
223
224RTL to RTL Peephole Optimizers
225^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
226
227The ``define_peephole2`` definition tells the compiler how to
228substitute one sequence of instructions for another sequence,
229what additional scratch registers may be needed and what their
230lifetimes must be.
231
232.. code-block::
233
234 (define_peephole2
235 [insn-pattern-1
236 insn-pattern-2
237 ...]
238 "condition"
239 [new-insn-pattern-1
240 new-insn-pattern-2
241 ...]
242 "preparation-statements")
243
244The definition is almost identical to ``define_split``
245(see :ref:`insn-splitting`) except that the pattern to match is not a
246single instruction, but a sequence of instructions.
247
248It is possible to request additional scratch registers for use in the
249output template. If appropriate registers are not free, the pattern
250will simply not match.
251
252.. index:: match_scratch, match_dup
253
254Scratch registers are requested with a ``match_scratch`` pattern at
255the top level of the input pattern. The allocated register (initially) will
256be dead at the point requested within the original sequence. If the scratch
257is used at more than a single point, a ``match_dup`` pattern at the
258top level of the input pattern marks the last position in the input sequence
259at which the register must be available.
260
261Here is an example from the IA-32 machine description:
262
263.. code-block::
264
265 (define_peephole2
266 [(match_scratch:SI 2 "r")
267 (parallel [(set (match_operand:SI 0 "register_operand" "")
268 (match_operator:SI 3 "arith_or_logical_operator"
269 [(match_dup 0)
270 (match_operand:SI 1 "memory_operand" "")]))
271 (clobber (reg:CC 17))])]
272 "! optimize_size && ! TARGET_READ_MODIFY"
273 [(set (match_dup 2) (match_dup 1))
274 (parallel [(set (match_dup 0)
275 (match_op_dup 3 [(match_dup 0) (match_dup 2)]))
276 (clobber (reg:CC 17))])]
277 "")
278
279This pattern tries to split a load from its use in the hopes that we'll be
280able to schedule around the memory load latency. It allocates a single
281``SImode`` register of class ``GENERAL_REGS`` (``"r"``) that needs
282to be live only at the point just before the arithmetic.
283
284A real example requiring extended scratch lifetimes is harder to come by,
285so here's a silly made-up example:
286
287.. code-block::
288
289 (define_peephole2
290 [(match_scratch:SI 4 "r")
291 (set (match_operand:SI 0 "" "") (match_operand:SI 1 "" ""))
292 (set (match_operand:SI 2 "" "") (match_dup 1))
293 (match_dup 4)
294 (set (match_operand:SI 3 "" "") (match_dup 1))]
295 "/* determine 1 does not overlap 0 and 2 */"
296 [(set (match_dup 4) (match_dup 1))
297 (set (match_dup 0) (match_dup 4))
298 (set (match_dup 2) (match_dup 4))
299 (set (match_dup 3) (match_dup 4))]
300 "")
301
302There are two special macros defined for use in the preparation statements:
303``DONE`` and ``FAIL``. Use them with a following semicolon,
304as a statement.
305
306.. index:: DONE
307
308.. envvar:: DONE
309
310 Use the ``DONE`` macro to end RTL generation for the peephole. The
311 only RTL insns generated as replacement for the matched input insn will
312 be those already emitted by explicit calls to ``emit_insn`` within
313 the preparation statements; the replacement pattern is not used.
314
315.. envvar:: FAIL
316
317 Make the ``define_peephole2`` fail on this occasion. When a ``define_peephole2``
318 fails, it means that the replacement was not truly available for the
319 particular inputs it was given. In that case, GCC may still apply a
320 later ``define_peephole2`` that also matches the given insn pattern.
321 (Note that this is different from ``define_split``, where ``FAIL``
322 prevents the input insn from being split at all.)
323
324If the preparation falls through (invokes neither ``DONE`` nor
325``FAIL``), then the ``define_peephole2`` uses the replacement
326template.
327
328If we had not added the ``(match_dup 4)`` in the middle of the input
329sequence, it might have been the case that the register we chose at the
3ed1b4ce 330beginning of the sequence is killed by the first or second ``set``.