]>
Commit | Line | Data |
---|---|---|
c63539ff ML |
1 | .. |
2 | Copyright 1988-2022 Free Software Foundation, Inc. | |
3 | This is part of the GCC manual. | |
4 | For copying conditions, see the copyright.rst file. | |
5 | ||
6 | .. index:: optimize options, options, optimization | |
7 | ||
8 | .. _optimize-options: | |
9 | ||
10 | Options That Control Optimization | |
11 | ********************************* | |
12 | ||
13 | These options control various sorts of optimizations. | |
14 | ||
15 | Without any optimization option, the compiler's goal is to reduce the | |
16 | cost of compilation and to make debugging produce the expected | |
17 | results. Statements are independent: if you stop the program with a | |
18 | breakpoint between statements, you can then assign a new value to any | |
19 | variable or change the program counter to any other statement in the | |
20 | function and get exactly the results you expect from the source | |
21 | code. | |
22 | ||
23 | Turning on optimization flags makes the compiler attempt to improve | |
24 | the performance and/or code size at the expense of compilation time | |
25 | and possibly the ability to debug the program. | |
26 | ||
27 | The compiler performs optimization based on the knowledge it has of the | |
28 | program. Compiling multiple files at once to a single output file mode allows | |
29 | the compiler to use information gained from all of the files when compiling | |
30 | each of them. | |
31 | ||
32 | Not all optimizations are controlled directly by a flag. Only | |
33 | optimizations that have a flag are listed in this section. | |
34 | ||
35 | Most optimizations are completely disabled at :option:`-O0` or if an | |
36 | :option:`-O` level is not set on the command line, even if individual | |
37 | optimization flags are specified. Similarly, :option:`-Og` suppresses | |
38 | many optimization passes. | |
39 | ||
40 | Depending on the target and how GCC was configured, a slightly different | |
41 | set of optimizations may be enabled at each :option:`-O` level than | |
42 | those listed here. You can invoke GCC with :option:`-Q --help=optimizers` | |
43 | to find out the exact set of optimizations that are enabled at each level. | |
44 | See :ref:`overall-options`, for examples. | |
45 | ||
46 | .. option:: -O, -O1 | |
47 | ||
48 | Optimize. Optimizing compilation takes somewhat more time, and a lot | |
49 | more memory for a large function. | |
50 | ||
51 | With :option:`-O`, the compiler tries to reduce code size and execution | |
52 | time, without performing any optimizations that take a great deal of | |
53 | compilation time. | |
54 | ||
55 | .. Note that in addition to the default_options_table list in opts.cc, | |
56 | several optimization flags default to true but control optimization | |
57 | passes that are explicitly disabled at -O0. | |
58 | ||
59 | :option:`-O` turns on the following optimization flags: | |
60 | ||
61 | .. Please keep the following list alphabetized. | |
62 | ||
63 | :option:`-fauto-inc-dec` |gol| | |
64 | :option:`-fbranch-count-reg` |gol| | |
65 | :option:`-fcombine-stack-adjustments` |gol| | |
66 | :option:`-fcompare-elim` |gol| | |
67 | :option:`-fcprop-registers` |gol| | |
68 | :option:`-fdce` |gol| | |
69 | :option:`-fdefer-pop` |gol| | |
70 | :option:`-fdelayed-branch` |gol| | |
71 | :option:`-fdse` |gol| | |
72 | :option:`-fforward-propagate` |gol| | |
73 | :option:`-fguess-branch-probability` |gol| | |
74 | :option:`-fif-conversion` |gol| | |
75 | :option:`-fif-conversion2` |gol| | |
76 | :option:`-finline-functions-called-once` |gol| | |
77 | :option:`-fipa-modref` |gol| | |
78 | :option:`-fipa-profile` |gol| | |
79 | :option:`-fipa-pure-const` |gol| | |
80 | :option:`-fipa-reference` |gol| | |
81 | :option:`-fipa-reference-addressable` |gol| | |
82 | :option:`-fmerge-constants` |gol| | |
83 | :option:`-fmove-loop-invariants` |gol| | |
84 | :option:`-fmove-loop-stores` |gol| | |
85 | :option:`-fomit-frame-pointer` |gol| | |
86 | :option:`-freorder-blocks` |gol| | |
87 | :option:`-fshrink-wrap` |gol| | |
88 | :option:`-fshrink-wrap-separate` |gol| | |
89 | :option:`-fsplit-wide-types` |gol| | |
90 | :option:`-fssa-backprop` |gol| | |
91 | :option:`-fssa-phiopt` |gol| | |
92 | :option:`-ftree-bit-ccp` |gol| | |
93 | :option:`-ftree-ccp` |gol| | |
94 | :option:`-ftree-ch` |gol| | |
95 | :option:`-ftree-coalesce-vars` |gol| | |
96 | :option:`-ftree-copy-prop` |gol| | |
97 | :option:`-ftree-dce` |gol| | |
98 | :option:`-ftree-dominator-opts` |gol| | |
99 | :option:`-ftree-dse` |gol| | |
100 | :option:`-ftree-forwprop` |gol| | |
101 | :option:`-ftree-fre` |gol| | |
102 | :option:`-ftree-phiprop` |gol| | |
103 | :option:`-ftree-pta` |gol| | |
104 | :option:`-ftree-scev-cprop` |gol| | |
105 | :option:`-ftree-sink` |gol| | |
106 | :option:`-ftree-slsr` |gol| | |
107 | :option:`-ftree-sra` |gol| | |
108 | :option:`-ftree-ter` |gol| | |
109 | :option:`-funit-at-a-time` | |
110 | ||
111 | .. option:: -O2 | |
112 | ||
113 | Optimize even more. GCC performs nearly all supported optimizations | |
114 | that do not involve a space-speed tradeoff. | |
115 | As compared to :option:`-O`, this option increases both compilation time | |
116 | and the performance of the generated code. | |
117 | ||
118 | :option:`-O2` turns on all optimization flags specified by :option:`-O1`. It | |
119 | also turns on the following optimization flags: | |
120 | ||
121 | .. Please keep the following list alphabetized! | |
122 | ||
123 | :option:`-falign-functions` :option:`-falign-jumps` |gol| | |
124 | :option:`-falign-labels` :option:`-falign-loops` |gol| | |
125 | :option:`-fcaller-saves` |gol| | |
126 | :option:`-fcode-hoisting` |gol| | |
127 | :option:`-fcrossjumping` |gol| | |
128 | :option:`-fcse-follow-jumps` :option:`-fcse-skip-blocks` |gol| | |
129 | :option:`-fdelete-null-pointer-checks` |gol| | |
130 | :option:`-fdevirtualize` :option:`-fdevirtualize-speculatively` |gol| | |
131 | :option:`-fexpensive-optimizations` |gol| | |
132 | :option:`-ffinite-loops` |gol| | |
133 | :option:`-fgcse` :option:`-fgcse-lm` |gol| | |
134 | :option:`-fhoist-adjacent-loads` |gol| | |
135 | :option:`-finline-functions` |gol| | |
136 | :option:`-finline-small-functions` |gol| | |
137 | :option:`-findirect-inlining` |gol| | |
138 | :option:`-fipa-bit-cp` :option:`-fipa-cp` :option:`-fipa-icf` |gol| | |
139 | :option:`-fipa-ra` :option:`-fipa-sra` :option:`-fipa-vrp` |gol| | |
140 | :option:`-fisolate-erroneous-paths-dereference` |gol| | |
141 | :option:`-flra-remat` |gol| | |
142 | :option:`-foptimize-sibling-calls` |gol| | |
143 | :option:`-foptimize-strlen` |gol| | |
144 | :option:`-fpartial-inlining` |gol| | |
145 | :option:`-fpeephole2` |gol| | |
146 | :option:`-freorder-blocks-algorithm=stc` |gol| | |
147 | :option:`-freorder-blocks-and-partition` :option:`-freorder-functions` |gol| | |
148 | :option:`-frerun-cse-after-loop` |gol| | |
149 | :option:`-fschedule-insns` :option:`-fschedule-insns2` |gol| | |
150 | :option:`-fsched-interblock` :option:`-fsched-spec` |gol| | |
151 | :option:`-fstore-merging` |gol| | |
152 | :option:`-fstrict-aliasing` |gol| | |
153 | :option:`-fthread-jumps` |gol| | |
154 | :option:`-ftree-builtin-call-dce` |gol| | |
155 | :option:`-ftree-loop-vectorize` |gol| | |
156 | :option:`-ftree-pre` |gol| | |
157 | :option:`-ftree-slp-vectorize` |gol| | |
158 | :option:`-ftree-switch-conversion` :option:`-ftree-tail-merge` |gol| | |
159 | :option:`-ftree-vrp` |gol| | |
160 | :option:`-fvect-cost-model=very-cheap` | |
161 | ||
162 | Please note the warning under :option:`-fgcse` about | |
163 | invoking :option:`-O2` on programs that use computed gotos. | |
164 | ||
165 | .. option:: -O3 | |
166 | ||
167 | Optimize yet more. :option:`-O3` turns on all optimizations specified | |
168 | by :option:`-O2` and also turns on the following optimization flags: | |
169 | ||
170 | .. Please keep the following list alphabetized! | |
171 | ||
172 | :option:`-fgcse-after-reload` |gol| | |
173 | :option:`-fipa-cp-clone` |gol| | |
174 | :option:`-floop-interchange` |gol| | |
175 | :option:`-floop-unroll-and-jam` |gol| | |
176 | :option:`-fpeel-loops` |gol| | |
177 | :option:`-fpredictive-commoning` |gol| | |
178 | :option:`-fsplit-loops` |gol| | |
179 | :option:`-fsplit-paths` |gol| | |
180 | :option:`-ftree-loop-distribution` |gol| | |
181 | :option:`-ftree-partial-pre` |gol| | |
182 | :option:`-funswitch-loops` |gol| | |
183 | :option:`-fvect-cost-model=dynamic` |gol| | |
184 | :option:`-fversion-loops-for-strides` | |
185 | ||
186 | .. option:: -O0 | |
187 | ||
188 | Reduce compilation time and make debugging produce the expected | |
189 | results. This is the default. | |
190 | ||
191 | .. option:: -Os | |
192 | ||
193 | Optimize for size. :option:`-Os` enables all :option:`-O2` optimizations | |
194 | except those that often increase code size: | |
195 | ||
196 | :option:`-falign-functions` :option:`-falign-jumps` |gol| | |
197 | :option:`-falign-labels` :option:`-falign-loops` |gol| | |
198 | :option:`-fprefetch-loop-arrays` :option:`-freorder-blocks-algorithm=stc` |gol| | |
199 | It also enables :option:`-finline-functions`, causes the compiler to tune for | |
200 | code size rather than execution speed, and performs further optimizations | |
201 | designed to reduce code size. | |
202 | ||
203 | .. option:: -Ofast | |
204 | ||
205 | Disregard strict standards compliance. :option:`-Ofast` enables all | |
206 | :option:`-O3` optimizations. It also enables optimizations that are not | |
207 | valid for all standard-compliant programs. | |
208 | It turns on :option:`-ffast-math`, :option:`-fallow-store-data-races` | |
209 | and the Fortran-specific :option:`-fstack-arrays`, unless | |
210 | :option:`-fmax-stack-var-size` is specified, and :option:`-fno-protect-parens`. | |
211 | It turns off :option:`-fsemantic-interposition`. | |
212 | ||
213 | .. option:: -Og | |
214 | ||
215 | Optimize debugging experience. :option:`-Og` should be the optimization | |
216 | level of choice for the standard edit-compile-debug cycle, offering | |
217 | a reasonable level of optimization while maintaining fast compilation | |
218 | and a good debugging experience. It is a better choice than :option:`-O0` | |
219 | for producing debuggable code because some compiler passes | |
220 | that collect debug information are disabled at :option:`-O0`. | |
221 | ||
222 | Like :option:`-O0`, :option:`-Og` completely disables a number of | |
223 | optimization passes so that individual options controlling them have | |
224 | no effect. Otherwise :option:`-Og` enables all :option:`-O1` | |
225 | optimization flags except for those that may interfere with debugging: | |
226 | ||
227 | :option:`-fbranch-count-reg` :option:`-fdelayed-branch` |gol| | |
228 | :option:`-fdse` :option:`-fif-conversion` :option:`-fif-conversion2` |gol| | |
229 | :option:`-finline-functions-called-once` |gol| | |
230 | :option:`-fmove-loop-invariants` :option:`-fmove-loop-stores` :option:`-fssa-phiopt` |gol| | |
231 | :option:`-ftree-bit-ccp` :option:`-ftree-dse` :option:`-ftree-pta` :option:`-ftree-sra` | |
232 | ||
233 | .. option:: -Oz | |
234 | ||
235 | Optimize aggressively for size rather than speed. This may increase | |
236 | the number of instructions executed if those instructions require | |
237 | fewer bytes to encode. :option:`-Oz` behaves similarly to :option:`-Os` | |
238 | including enabling most :option:`-O2` optimizations. | |
239 | ||
240 | If you use multiple :option:`-O` options, with or without level numbers, | |
241 | the last such option is the one that is effective. | |
242 | ||
243 | Options of the form :samp:`-fflag` specify machine-independent | |
244 | flags. Most flags have both positive and negative forms; the negative | |
245 | form of :samp:`-ffoo` is :samp:`-fno-foo`. In the table | |
246 | below, only one of the forms is listed---the one you typically | |
247 | use. You can figure out the other form by either removing :samp:`no-` | |
248 | or adding it. | |
249 | ||
250 | The following options control specific optimizations. They are either | |
251 | activated by :option:`-O` options or are related to ones that are. You | |
252 | can use the following flags in the rare cases when 'fine-tuning' of | |
253 | optimizations to be performed is desired. | |
254 | ||
255 | .. option:: -fno-defer-pop | |
256 | ||
257 | For machines that must pop arguments after a function call, always pop | |
258 | the arguments as soon as each function returns. | |
259 | At levels :option:`-O1` and higher, :option:`-fdefer-pop` is the default; | |
260 | this allows the compiler to let arguments accumulate on the stack for several | |
261 | function calls and pop them all at once. | |
262 | ||
263 | .. option:: -fdefer-pop | |
264 | ||
265 | Default setting; overrides :option:`-fno-defer-pop`. | |
266 | ||
267 | .. option:: -fforward-propagate | |
268 | ||
269 | Perform a forward propagation pass on RTL. The pass tries to combine two | |
270 | instructions and checks if the result can be simplified. If loop unrolling | |
271 | is active, two passes are performed and the second is scheduled after | |
272 | loop unrolling. | |
273 | ||
274 | This option is enabled by default at optimization levels :option:`-O1`, | |
275 | :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
276 | ||
277 | .. option:: -ffp-contract={style} | |
278 | ||
279 | :option:`-ffp-contract=off` disables floating-point expression contraction. | |
280 | :option:`-ffp-contract=fast` enables floating-point expression contraction | |
281 | such as forming of fused multiply-add operations if the target has | |
282 | native support for them. | |
283 | :option:`-ffp-contract=on` enables floating-point expression contraction | |
284 | if allowed by the language standard. This is currently not implemented | |
285 | and treated equal to :option:`-ffp-contract=off`. | |
286 | ||
287 | The default is :option:`-ffp-contract=fast`. | |
288 | ||
289 | .. option:: -fomit-frame-pointer | |
290 | ||
291 | Omit the frame pointer in functions that don't need one. This avoids the | |
292 | instructions to save, set up and restore the frame pointer; on many targets | |
293 | it also makes an extra register available. | |
294 | ||
295 | On some targets this flag has no effect because the standard calling sequence | |
296 | always uses a frame pointer, so it cannot be omitted. | |
297 | ||
298 | Note that :option:`-fno-omit-frame-pointer` doesn't guarantee the frame pointer | |
299 | is used in all functions. Several targets always omit the frame pointer in | |
300 | leaf functions. | |
301 | ||
302 | Enabled by default at :option:`-O1` and higher. | |
303 | ||
304 | .. option:: -foptimize-sibling-calls | |
305 | ||
306 | Optimize sibling and tail recursive calls. | |
307 | ||
308 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
309 | ||
310 | .. option:: -foptimize-strlen | |
311 | ||
312 | Optimize various standard C string functions (e.g. ``strlen``, | |
313 | ``strchr`` or ``strcpy``) and | |
314 | their ``_FORTIFY_SOURCE`` counterparts into faster alternatives. | |
315 | ||
316 | Enabled at levels :option:`-O2`, :option:`-O3`. | |
317 | ||
318 | .. option:: -fno-inline | |
319 | ||
320 | Do not expand any functions inline apart from those marked with | |
321 | the :fn-attr:`always_inline` attribute. This is the default when not | |
322 | optimizing. | |
323 | ||
324 | Single functions can be exempted from inlining by marking them | |
325 | with the :fn-attr:`noinline` attribute. | |
326 | ||
327 | .. option:: -finline | |
328 | ||
329 | Default setting; overrides :option:`-fno-inline`. | |
330 | ||
331 | .. option:: -finline-small-functions | |
332 | ||
333 | Integrate functions into their callers when their body is smaller than expected | |
334 | function call code (so overall size of program gets smaller). The compiler | |
335 | heuristically decides which functions are simple enough to be worth integrating | |
336 | in this way. This inlining applies to all functions, even those not declared | |
337 | inline. | |
338 | ||
339 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
340 | ||
341 | .. option:: -findirect-inlining | |
342 | ||
343 | Inline also indirect calls that are discovered to be known at compile | |
344 | time thanks to previous inlining. This option has any effect only | |
345 | when inlining itself is turned on by the :option:`-finline-functions` | |
346 | or :option:`-finline-small-functions` options. | |
347 | ||
348 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
349 | ||
350 | .. option:: -finline-functions | |
351 | ||
352 | Consider all functions for inlining, even if they are not declared inline. | |
353 | The compiler heuristically decides which functions are worth integrating | |
354 | in this way. | |
355 | ||
356 | If all calls to a given function are integrated, and the function is | |
357 | declared ``static``, then the function is normally not output as | |
358 | assembler code in its own right. | |
359 | ||
360 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. Also enabled | |
361 | by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
362 | ||
363 | .. option:: -finline-functions-called-once | |
364 | ||
365 | Consider all ``static`` functions called once for inlining into their | |
366 | caller even if they are not marked ``inline``. If a call to a given | |
367 | function is integrated, then the function is not output as assembler code | |
368 | in its own right. | |
369 | ||
370 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3` and :option:`-Os`, | |
371 | but not :option:`-Og`. | |
372 | ||
373 | .. option:: -fearly-inlining | |
374 | ||
375 | Inline functions marked by :fn-attr:`always_inline` and functions whose body seems | |
376 | smaller than the function call overhead early before doing | |
377 | :option:`-fprofile-generate` instrumentation and real inlining pass. Doing so | |
378 | makes profiling significantly cheaper and usually inlining faster on programs | |
379 | having large chains of nested wrapper functions. | |
380 | ||
381 | Enabled by default. | |
382 | ||
383 | .. option:: -fipa-sra | |
384 | ||
385 | Perform interprocedural scalar replacement of aggregates, removal of | |
386 | unused parameters and replacement of parameters passed by reference | |
387 | by parameters passed by value. | |
388 | ||
389 | Enabled at levels :option:`-O2`, :option:`-O3` and :option:`-Os`. | |
390 | ||
391 | .. option:: -finline-limit={n} | |
392 | ||
393 | By default, GCC limits the size of functions that can be inlined. This flag | |
394 | allows coarse control of this limit. :samp:`{n}` is the size of functions that | |
395 | can be inlined in number of pseudo instructions. | |
396 | ||
397 | Inlining is actually controlled by a number of parameters, which may be | |
398 | specified individually by using :option:`--param name=value`. | |
399 | The :option:`-finline-limit=n` option sets some of these parameters | |
400 | as follows: | |
401 | ||
402 | ``max-inline-insns-single`` | |
403 | is set to :samp:`{n}/2`. | |
404 | ||
405 | ``max-inline-insns-auto`` | |
406 | is set to :samp:`{n}/2`. | |
407 | ||
408 | See below for a documentation of the individual | |
409 | parameters controlling inlining and for the defaults of these parameters. | |
410 | ||
411 | .. note:: | |
412 | There may be no value to :option:`-finline-limit` that results | |
413 | in default behavior. | |
414 | ||
415 | .. note:: | |
416 | Pseudo instruction represents, in this particular context, an | |
417 | abstract measurement of function's size. In no way does it represent a count | |
418 | of assembly instructions and as such its exact meaning might change from one | |
419 | release to an another. | |
420 | ||
421 | .. option:: -fno-keep-inline-dllexport | |
422 | ||
423 | This is a more fine-grained version of :option:`-fkeep-inline-functions`, | |
424 | which applies only to functions that are declared using the :microsoft-windows-fn-attr:`dllexport` | |
425 | attribute or declspec. See :ref:`function-attributes`. | |
426 | ||
427 | .. option:: -fkeep-inline-dllexport | |
428 | ||
429 | Default setting; overrides :option:`-fno-keep-inline-dllexport`. | |
430 | ||
431 | .. option:: -fkeep-inline-functions | |
432 | ||
433 | In C, emit ``static`` functions that are declared ``inline`` | |
434 | into the object file, even if the function has been inlined into all | |
435 | of its callers. This switch does not affect functions using the | |
436 | ``extern inline`` extension in GNU C90. In C++, emit any and all | |
437 | inline functions into the object file. | |
438 | ||
439 | .. option:: -fkeep-static-functions | |
440 | ||
441 | Emit ``static`` functions into the object file, even if the function | |
442 | is never used. | |
443 | ||
444 | .. option:: -fkeep-static-consts | |
445 | ||
446 | Emit variables declared ``static const`` when optimization isn't turned | |
447 | on, even if the variables aren't referenced. | |
448 | ||
449 | GCC enables this option by default. If you want to force the compiler to | |
450 | check if a variable is referenced, regardless of whether or not | |
451 | optimization is turned on, use the :option:`-fno-keep-static-consts` option. | |
452 | ||
453 | .. option:: -fmerge-constants | |
454 | ||
455 | Attempt to merge identical constants (string constants and floating-point | |
456 | constants) across compilation units. | |
457 | ||
458 | This option is the default for optimized compilation if the assembler and | |
459 | linker support it. Use :option:`-fno-merge-constants` to inhibit this | |
460 | behavior. | |
461 | ||
462 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
463 | ||
464 | .. option:: -fmerge-all-constants | |
465 | ||
466 | Attempt to merge identical constants and identical variables. | |
467 | ||
468 | This option implies :option:`-fmerge-constants`. In addition to | |
469 | :option:`-fmerge-constants` this considers e.g. even constant initialized | |
470 | arrays or initialized constant variables with integral or floating-point | |
471 | types. Languages like C or C++ require each variable, including multiple | |
472 | instances of the same variable in recursive calls, to have distinct locations, | |
473 | so using this option results in non-conforming | |
474 | behavior. | |
475 | ||
476 | .. option:: -fmodulo-sched | |
477 | ||
478 | Perform swing modulo scheduling immediately before the first scheduling | |
479 | pass. This pass looks at innermost loops and reorders their | |
480 | instructions by overlapping different iterations. | |
481 | ||
482 | .. option:: -fmodulo-sched-allow-regmoves | |
483 | ||
484 | Perform more aggressive SMS-based modulo scheduling with register moves | |
485 | allowed. By setting this flag certain anti-dependences edges are | |
486 | deleted, which triggers the generation of reg-moves based on the | |
487 | life-range analysis. This option is effective only with | |
488 | :option:`-fmodulo-sched` enabled. | |
489 | ||
490 | .. option:: -fno-branch-count-reg | |
491 | ||
492 | Disable the optimization pass that scans for opportunities to use | |
493 | 'decrement and branch' instructions on a count register instead of | |
494 | instruction sequences that decrement a register, compare it against zero, and | |
495 | then branch based upon the result. This option is only meaningful on | |
496 | architectures that support such instructions, which include x86, PowerPC, | |
497 | IA-64 and S/390. Note that the :option:`-fno-branch-count-reg` option | |
498 | doesn't remove the decrement and branch instructions from the generated | |
499 | instruction stream introduced by other optimization passes. | |
500 | ||
501 | The default is :option:`-fbranch-count-reg` at :option:`-O1` and higher, | |
502 | except for :option:`-Og`. | |
503 | ||
504 | .. option:: -fbranch-count-reg | |
505 | ||
506 | Default setting; overrides :option:`-fno-branch-count-reg`. | |
507 | ||
508 | .. option:: -fno-function-cse | |
509 | ||
510 | Do not put function addresses in registers; make each instruction that | |
511 | calls a constant function contain the function's address explicitly. | |
512 | ||
513 | This option results in less efficient code, but some strange hacks | |
514 | that alter the assembler output may be confused by the optimizations | |
515 | performed when this option is not used. | |
516 | ||
517 | The default is :option:`-ffunction-cse` | |
518 | ||
519 | .. option:: -ffunction-cse | |
520 | ||
521 | Default setting; overrides :option:`-fno-function-cse`. | |
522 | ||
523 | .. option:: -fno-zero-initialized-in-bss | |
524 | ||
525 | If the target supports a BSS section, GCC by default puts variables that | |
526 | are initialized to zero into BSS. This can save space in the resulting | |
527 | code. | |
528 | ||
529 | This option turns off this behavior because some programs explicitly | |
530 | rely on variables going to the data section---e.g., so that the | |
531 | resulting executable can find the beginning of that section and/or make | |
532 | assumptions based on that. | |
533 | ||
534 | The default is :option:`-fzero-initialized-in-bss`. | |
535 | ||
536 | .. option:: -fzero-initialized-in-bss | |
537 | ||
538 | Default setting; overrides :option:`-fno-zero-initialized-in-bss`. | |
539 | ||
540 | .. option:: -fthread-jumps | |
541 | ||
542 | Perform optimizations that check to see if a jump branches to a | |
543 | location where another comparison subsumed by the first is found. If | |
544 | so, the first branch is redirected to either the destination of the | |
545 | second branch or a point immediately following it, depending on whether | |
546 | the condition is known to be true or false. | |
547 | ||
548 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
549 | ||
550 | .. option:: -fsplit-wide-types | |
551 | ||
552 | When using a type that occupies multiple registers, such as ``long | |
553 | long`` on a 32-bit system, split the registers apart and allocate them | |
554 | independently. This normally generates better code for those types, | |
555 | but may make debugging more difficult. | |
556 | ||
557 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3`, | |
558 | :option:`-Os`. | |
559 | ||
560 | .. option:: -fsplit-wide-types-early | |
561 | ||
562 | Fully split wide types early, instead of very late. | |
563 | This option has no effect unless :option:`-fsplit-wide-types` is turned on. | |
564 | ||
565 | This is the default on some targets. | |
566 | ||
567 | .. option:: -fcse-follow-jumps | |
568 | ||
569 | In common subexpression elimination (CSE), scan through jump instructions | |
570 | when the target of the jump is not reached by any other path. For | |
571 | example, when CSE encounters an ``if`` statement with an | |
572 | ``else`` clause, CSE follows the jump when the condition | |
573 | tested is false. | |
574 | ||
575 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
576 | ||
577 | .. option:: -fcse-skip-blocks | |
578 | ||
579 | This is similar to :option:`-fcse-follow-jumps`, but causes CSE to | |
580 | follow jumps that conditionally skip over blocks. When CSE | |
581 | encounters a simple ``if`` statement with no else clause, | |
582 | :option:`-fcse-skip-blocks` causes CSE to follow the jump around the | |
583 | body of the ``if``. | |
584 | ||
585 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
586 | ||
587 | .. option:: -frerun-cse-after-loop | |
588 | ||
589 | Re-run common subexpression elimination after loop optimizations are | |
590 | performed. | |
591 | ||
592 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
593 | ||
594 | .. option:: -fgcse | |
595 | ||
596 | Perform a global common subexpression elimination pass. | |
597 | This pass also performs global constant and copy propagation. | |
598 | ||
599 | .. note:: | |
600 | ||
601 | When compiling a program using computed gotos, a GCC | |
602 | extension, you may get better run-time performance if you disable | |
603 | the global common subexpression elimination pass by adding | |
604 | :option:`-fno-gcse` to the command line. | |
605 | ||
606 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
607 | ||
608 | .. option:: -fgcse-lm | |
609 | ||
610 | When :option:`-fgcse-lm` is enabled, global common subexpression elimination | |
611 | attempts to move loads that are only killed by stores into themselves. This | |
612 | allows a loop containing a load/store sequence to be changed to a load outside | |
613 | the loop, and a copy/store within the loop. | |
614 | ||
615 | Enabled by default when :option:`-fgcse` is enabled. | |
616 | ||
617 | .. option:: -fgcse-sm | |
618 | ||
619 | When :option:`-fgcse-sm` is enabled, a store motion pass is run after | |
620 | global common subexpression elimination. This pass attempts to move | |
621 | stores out of loops. When used in conjunction with :option:`-fgcse-lm`, | |
622 | loops containing a load/store sequence can be changed to a load before | |
623 | the loop and a store after the loop. | |
624 | ||
625 | Not enabled at any optimization level. | |
626 | ||
627 | .. option:: -fgcse-las | |
628 | ||
629 | When :option:`-fgcse-las` is enabled, the global common subexpression | |
630 | elimination pass eliminates redundant loads that come after stores to the | |
631 | same memory location (both partial and full redundancies). | |
632 | ||
633 | Not enabled at any optimization level. | |
634 | ||
635 | .. option:: -fgcse-after-reload | |
636 | ||
637 | When :option:`-fgcse-after-reload` is enabled, a redundant load elimination | |
638 | pass is performed after reload. The purpose of this pass is to clean up | |
639 | redundant spilling. | |
640 | ||
641 | Enabled by :option:`-O3`, :option:`-fprofile-use` and :option:`-fauto-profile`. | |
642 | ||
643 | .. option:: -faggressive-loop-optimizations | |
644 | ||
645 | This option tells the loop optimizer to use language constraints to | |
646 | derive bounds for the number of iterations of a loop. This assumes that | |
647 | loop code does not invoke undefined behavior by for example causing signed | |
648 | integer overflows or out-of-bound array accesses. The bounds for the | |
649 | number of iterations of a loop are used to guide loop unrolling and peeling | |
650 | and loop exit test optimizations. | |
651 | This option is enabled by default. | |
652 | ||
653 | .. option:: -funconstrained-commons | |
654 | ||
655 | This option tells the compiler that variables declared in common blocks | |
656 | (e.g. Fortran) may later be overridden with longer trailing arrays. This | |
657 | prevents certain optimizations that depend on knowing the array bounds. | |
658 | ||
659 | .. option:: -fcrossjumping | |
660 | ||
661 | Perform cross-jumping transformation. | |
662 | This transformation unifies equivalent code and saves code size. The | |
663 | resulting code may or may not perform better than without cross-jumping. | |
664 | ||
665 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
666 | ||
667 | .. option:: -fauto-inc-dec | |
668 | ||
669 | Combine increments or decrements of addresses with memory accesses. | |
670 | This pass is always skipped on architectures that do not have | |
671 | instructions to support this. Enabled by default at :option:`-O1` and | |
672 | higher on architectures that support this. | |
673 | ||
674 | .. option:: -fdce | |
675 | ||
676 | Perform dead code elimination (DCE) on RTL. | |
677 | Enabled by default at :option:`-O1` and higher. | |
678 | ||
679 | .. option:: -fdse | |
680 | ||
681 | Perform dead store elimination (DSE) on RTL. | |
682 | Enabled by default at :option:`-O1` and higher. | |
683 | ||
684 | .. option:: -fif-conversion | |
685 | ||
686 | Attempt to transform conditional jumps into branch-less equivalents. This | |
687 | includes use of conditional moves, min, max, set flags and abs instructions, and | |
688 | some tricks doable by standard arithmetics. The use of conditional execution | |
689 | on chips where it is available is controlled by :option:`-fif-conversion2`. | |
690 | ||
691 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3`, :option:`-Os`, but | |
692 | not with :option:`-Og`. | |
693 | ||
694 | .. option:: -fif-conversion2 | |
695 | ||
696 | Use conditional execution (where available) to transform conditional jumps into | |
697 | branch-less equivalents. | |
698 | ||
699 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3`, :option:`-Os`, but | |
700 | not with :option:`-Og`. | |
701 | ||
702 | .. option:: -fdeclone-ctor-dtor | |
703 | ||
704 | The C++ ABI requires multiple entry points for constructors and | |
705 | destructors: one for a base subobject, one for a complete object, and | |
706 | one for a virtual destructor that calls operator delete afterwards. | |
707 | For a hierarchy with virtual bases, the base and complete variants are | |
708 | clones, which means two copies of the function. With this option, the | |
709 | base and complete variants are changed to be thunks that call a common | |
710 | implementation. | |
711 | ||
712 | Enabled by :option:`-Os`. | |
713 | ||
714 | .. option:: -fdelete-null-pointer-checks | |
715 | ||
716 | Assume that programs cannot safely dereference null pointers, and that | |
717 | no code or data element resides at address zero. | |
718 | This option enables simple constant | |
719 | folding optimizations at all optimization levels. In addition, other | |
720 | optimization passes in GCC use this flag to control global dataflow | |
721 | analyses that eliminate useless checks for null pointers; these assume | |
722 | that a memory access to address zero always results in a trap, so | |
723 | that if a pointer is checked after it has already been dereferenced, | |
724 | it cannot be null. | |
725 | ||
726 | Note however that in some environments this assumption is not true. | |
727 | Use :option:`-fno-delete-null-pointer-checks` to disable this optimization | |
728 | for programs that depend on that behavior. | |
729 | ||
730 | This option is enabled by default on most targets. On Nios II ELF, it | |
731 | defaults to off. On AVR and MSP430, this option is completely disabled. | |
732 | ||
733 | Passes that use the dataflow information | |
734 | are enabled independently at different optimization levels. | |
735 | ||
736 | .. option:: -fdevirtualize | |
737 | ||
738 | Attempt to convert calls to virtual functions to direct calls. This | |
739 | is done both within a procedure and interprocedurally as part of | |
740 | indirect inlining (:option:`-findirect-inlining`) and interprocedural constant | |
741 | propagation (:option:`-fipa-cp`). | |
742 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
743 | ||
744 | .. option:: -fdevirtualize-speculatively | |
745 | ||
746 | Attempt to convert calls to virtual functions to speculative direct calls. | |
747 | Based on the analysis of the type inheritance graph, determine for a given call | |
748 | the set of likely targets. If the set is small, preferably of size 1, change | |
749 | the call into a conditional deciding between direct and indirect calls. The | |
750 | speculative calls enable more optimizations, such as inlining. When they seem | |
751 | useless after further optimization, they are converted back into original form. | |
752 | ||
753 | .. option:: -fdevirtualize-at-ltrans | |
754 | ||
755 | Stream extra information needed for aggressive devirtualization when running | |
756 | the link-time optimizer in local transformation mode. | |
757 | This option enables more devirtualization but | |
758 | significantly increases the size of streamed data. For this reason it is | |
759 | disabled by default. | |
760 | ||
761 | .. option:: -fexpensive-optimizations | |
762 | ||
763 | Perform a number of minor optimizations that are relatively expensive. | |
764 | ||
765 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
766 | ||
767 | .. option:: -free | |
768 | ||
769 | Attempt to remove redundant extension instructions. This is especially | |
770 | helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit | |
771 | registers after writing to their lower 32-bit half. | |
772 | ||
773 | Enabled for Alpha, AArch64 and x86 at levels :option:`-O2`, | |
774 | :option:`-O3`, :option:`-Os`. | |
775 | ||
776 | .. option:: -fno-lifetime-dse | |
777 | ||
778 | In C++ the value of an object is only affected by changes within its | |
779 | lifetime: when the constructor begins, the object has an indeterminate | |
780 | value, and any changes during the lifetime of the object are dead when | |
781 | the object is destroyed. Normally dead store elimination will take | |
782 | advantage of this; if your code relies on the value of the object | |
783 | storage persisting beyond the lifetime of the object, you can use this | |
784 | flag to disable this optimization. To preserve stores before the | |
785 | constructor starts (e.g. because your operator new clears the object | |
786 | storage) but still treat the object as dead after the destructor, you | |
787 | can use :option:`-flifetime-dse=1`. The default behavior can be | |
788 | explicitly selected with :option:`-flifetime-dse=2`. | |
789 | :option:`-flifetime-dse=0` is equivalent to :option:`-fno-lifetime-dse`. | |
790 | ||
791 | .. option:: -flifetime-dse | |
792 | ||
793 | Default setting; overrides :option:`-fno-lifetime-dse`. | |
794 | ||
795 | .. option:: -flive-range-shrinkage | |
796 | ||
797 | Attempt to decrease register pressure through register live range | |
798 | shrinkage. This is helpful for fast processors with small or moderate | |
799 | size register sets. | |
800 | ||
801 | .. option:: -fira-algorithm={algorithm} | |
802 | ||
803 | Use the specified coloring algorithm for the integrated register | |
804 | allocator. The :samp:`{algorithm}` argument can be :samp:`priority`, which | |
805 | specifies Chow's priority coloring, or :samp:`CB`, which specifies | |
806 | Chaitin-Briggs coloring. Chaitin-Briggs coloring is not implemented | |
807 | for all architectures, but for those targets that do support it, it is | |
808 | the default because it generates better code. | |
809 | ||
810 | .. option:: -fira-region={region} | |
811 | ||
812 | Use specified regions for the integrated register allocator. The | |
813 | :samp:`{region}` argument should be one of the following: | |
814 | ||
815 | :samp:`all` | |
816 | Use all loops as register allocation regions. | |
817 | This can give the best results for machines with a small and/or | |
818 | irregular register set. | |
819 | ||
820 | :samp:`mixed` | |
821 | Use all loops except for loops with small register pressure | |
822 | as the regions. This value usually gives | |
823 | the best results in most cases and for most architectures, | |
824 | and is enabled by default when compiling with optimization for speed | |
825 | (:option:`-O`, :option:`-O2`, ...). | |
826 | ||
827 | :samp:`one` | |
828 | Use all functions as a single region. | |
829 | This typically results in the smallest code size, and is enabled by default for | |
830 | :option:`-Os` or :option:`-O0`. | |
831 | ||
832 | .. option:: -fira-hoist-pressure | |
833 | ||
834 | Use IRA to evaluate register pressure in the code hoisting pass for | |
835 | decisions to hoist expressions. This option usually results in smaller | |
836 | code, but it can slow the compiler down. | |
837 | ||
838 | This option is enabled at level :option:`-Os` for all targets. | |
839 | ||
840 | .. option:: -fira-loop-pressure | |
841 | ||
842 | Use IRA to evaluate register pressure in loops for decisions to move | |
843 | loop invariants. This option usually results in generation | |
844 | of faster and smaller code on machines with large register files (>= 32 | |
845 | registers), but it can slow the compiler down. | |
846 | ||
847 | This option is enabled at level :option:`-O3` for some targets. | |
848 | ||
849 | .. option:: -fno-ira-share-save-slots | |
850 | ||
851 | Disable sharing of stack slots used for saving call-used hard | |
852 | registers living through a call. Each hard register gets a | |
853 | separate stack slot, and as a result function stack frames are | |
854 | larger. | |
855 | ||
856 | .. option:: -fira-share-save-slots | |
857 | ||
858 | Default setting; overrides :option:`-fno-ira-share-save-slots`. | |
859 | ||
860 | .. option:: -fno-ira-share-spill-slots | |
861 | ||
862 | Disable sharing of stack slots allocated for pseudo-registers. Each | |
863 | pseudo-register that does not get a hard register gets a separate | |
864 | stack slot, and as a result function stack frames are larger. | |
865 | ||
866 | .. option:: -fira-share-spill-slots | |
867 | ||
868 | Default setting; overrides :option:`-fno-ira-share-spill-slots`. | |
869 | ||
870 | .. option:: -flra-remat | |
871 | ||
872 | Enable CFG-sensitive rematerialization in LRA. Instead of loading | |
873 | values of spilled pseudos, LRA tries to rematerialize (recalculate) | |
874 | values if it is profitable. | |
875 | ||
876 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
877 | ||
878 | .. option:: -fdelayed-branch | |
879 | ||
880 | If supported for the target machine, attempt to reorder instructions | |
881 | to exploit instruction slots available after delayed branch | |
882 | instructions. | |
883 | ||
884 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3`, :option:`-Os`, | |
885 | but not at :option:`-Og`. | |
886 | ||
887 | .. option:: -fschedule-insns | |
888 | ||
889 | If supported for the target machine, attempt to reorder instructions to | |
890 | eliminate execution stalls due to required data being unavailable. This | |
891 | helps machines that have slow floating point or memory load instructions | |
892 | by allowing other instructions to be issued until the result of the load | |
893 | or floating-point instruction is required. | |
894 | ||
895 | Enabled at levels :option:`-O2`, :option:`-O3`. | |
896 | ||
897 | .. option:: -fschedule-insns2 | |
898 | ||
899 | Similar to :option:`-fschedule-insns`, but requests an additional pass of | |
900 | instruction scheduling after register allocation has been done. This is | |
901 | especially useful on machines with a relatively small number of | |
902 | registers and where memory load instructions take more than one cycle. | |
903 | ||
904 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
905 | ||
906 | .. option:: -fno-sched-interblock | |
907 | ||
908 | Disable instruction scheduling across basic blocks, which | |
909 | is normally enabled when scheduling before register allocation, i.e. | |
910 | with :option:`-fschedule-insns` or at :option:`-O2` or higher. | |
911 | ||
912 | .. option:: -fsched-interblock | |
913 | ||
914 | Default setting; overrides :option:`-fno-sched-interblock`. | |
915 | ||
916 | .. option:: -fno-sched-spec | |
917 | ||
918 | Disable speculative motion of non-load instructions, which | |
919 | is normally enabled when scheduling before register allocation, i.e. | |
920 | with :option:`-fschedule-insns` or at :option:`-O2` or higher. | |
921 | ||
922 | .. option:: -fsched-spec | |
923 | ||
924 | Default setting; overrides :option:`-fno-sched-spec`. | |
925 | ||
926 | .. option:: -fsched-pressure | |
927 | ||
928 | Enable register pressure sensitive insn scheduling before register | |
929 | allocation. This only makes sense when scheduling before register | |
930 | allocation is enabled, i.e. with :option:`-fschedule-insns` or at | |
931 | :option:`-O2` or higher. Usage of this option can improve the | |
932 | generated code and decrease its size by preventing register pressure | |
933 | increase above the number of available hard registers and subsequent | |
934 | spills in register allocation. | |
935 | ||
936 | .. option:: -fsched-spec-load | |
937 | ||
938 | Allow speculative motion of some load instructions. This only makes | |
939 | sense when scheduling before register allocation, i.e. with | |
940 | :option:`-fschedule-insns` or at :option:`-O2` or higher. | |
941 | ||
942 | .. option:: -fsched-spec-load-dangerous | |
943 | ||
944 | Allow speculative motion of more load instructions. This only makes | |
945 | sense when scheduling before register allocation, i.e. with | |
946 | :option:`-fschedule-insns` or at :option:`-O2` or higher. | |
947 | ||
948 | .. option:: -fsched-stalled-insns, -fsched-stalled-insns={n} | |
949 | ||
950 | Define how many insns (if any) can be moved prematurely from the queue | |
951 | of stalled insns into the ready list during the second scheduling pass. | |
952 | :option:`-fno-sched-stalled-insns` means that no insns are moved | |
953 | prematurely, :option:`-fsched-stalled-insns=0` means there is no limit | |
954 | on how many queued insns can be moved prematurely. | |
955 | :option:`-fsched-stalled-insns` without a value is equivalent to | |
956 | :option:`-fsched-stalled-insns=1`. | |
957 | ||
958 | .. option:: -fsched-stalled-insns-dep, -fsched-stalled-insns-dep={n} | |
959 | ||
960 | Define how many insn groups (cycles) are examined for a dependency | |
961 | on a stalled insn that is a candidate for premature removal from the queue | |
962 | of stalled insns. This has an effect only during the second scheduling pass, | |
963 | and only if :option:`-fsched-stalled-insns` is used. | |
964 | :option:`-fno-sched-stalled-insns-dep` is equivalent to | |
965 | :option:`-fsched-stalled-insns-dep=0`. | |
966 | :option:`-fsched-stalled-insns-dep` without a value is equivalent to | |
967 | :option:`-fsched-stalled-insns-dep=1`. | |
968 | ||
969 | .. option:: -fsched2-use-superblocks | |
970 | ||
971 | When scheduling after register allocation, use superblock scheduling. | |
972 | This allows motion across basic block boundaries, | |
973 | resulting in faster schedules. This option is experimental, as not all machine | |
974 | descriptions used by GCC model the CPU closely enough to avoid unreliable | |
975 | results from the algorithm. | |
976 | ||
977 | This only makes sense when scheduling after register allocation, i.e. with | |
978 | :option:`-fschedule-insns2` or at :option:`-O2` or higher. | |
979 | ||
980 | .. option:: -fsched-group-heuristic | |
981 | ||
982 | Enable the group heuristic in the scheduler. This heuristic favors | |
983 | the instruction that belongs to a schedule group. This is enabled | |
984 | by default when scheduling is enabled, i.e. with :option:`-fschedule-insns` | |
985 | or :option:`-fschedule-insns2` or at :option:`-O2` or higher. | |
986 | ||
987 | .. option:: -fsched-critical-path-heuristic | |
988 | ||
989 | Enable the critical-path heuristic in the scheduler. This heuristic favors | |
990 | instructions on the critical path. This is enabled by default when | |
991 | scheduling is enabled, i.e. with :option:`-fschedule-insns` | |
992 | or :option:`-fschedule-insns2` or at :option:`-O2` or higher. | |
993 | ||
994 | .. option:: -fsched-spec-insn-heuristic | |
995 | ||
996 | Enable the speculative instruction heuristic in the scheduler. This | |
997 | heuristic favors speculative instructions with greater dependency weakness. | |
998 | This is enabled by default when scheduling is enabled, i.e. | |
999 | with :option:`-fschedule-insns` or :option:`-fschedule-insns2` | |
1000 | or at :option:`-O2` or higher. | |
1001 | ||
1002 | .. option:: -fsched-rank-heuristic | |
1003 | ||
1004 | Enable the rank heuristic in the scheduler. This heuristic favors | |
1005 | the instruction belonging to a basic block with greater size or frequency. | |
1006 | This is enabled by default when scheduling is enabled, i.e. | |
1007 | with :option:`-fschedule-insns` or :option:`-fschedule-insns2` or | |
1008 | at :option:`-O2` or higher. | |
1009 | ||
1010 | .. option:: -fsched-last-insn-heuristic | |
1011 | ||
1012 | Enable the last-instruction heuristic in the scheduler. This heuristic | |
1013 | favors the instruction that is less dependent on the last instruction | |
1014 | scheduled. This is enabled by default when scheduling is enabled, | |
1015 | i.e. with :option:`-fschedule-insns` or :option:`-fschedule-insns2` or | |
1016 | at :option:`-O2` or higher. | |
1017 | ||
1018 | .. option:: -fsched-dep-count-heuristic | |
1019 | ||
1020 | Enable the dependent-count heuristic in the scheduler. This heuristic | |
1021 | favors the instruction that has more instructions depending on it. | |
1022 | This is enabled by default when scheduling is enabled, i.e. | |
1023 | with :option:`-fschedule-insns` or :option:`-fschedule-insns2` or | |
1024 | at :option:`-O2` or higher. | |
1025 | ||
1026 | .. option:: -freschedule-modulo-scheduled-loops | |
1027 | ||
1028 | Modulo scheduling is performed before traditional scheduling. If a loop | |
1029 | is modulo scheduled, later scheduling passes may change its schedule. | |
1030 | Use this option to control that behavior. | |
1031 | ||
1032 | .. option:: -fselective-scheduling | |
1033 | ||
1034 | Schedule instructions using selective scheduling algorithm. Selective | |
1035 | scheduling runs instead of the first scheduler pass. | |
1036 | ||
1037 | .. option:: -fselective-scheduling2 | |
1038 | ||
1039 | Schedule instructions using selective scheduling algorithm. Selective | |
1040 | scheduling runs instead of the second scheduler pass. | |
1041 | ||
1042 | .. option:: -fsel-sched-pipelining | |
1043 | ||
1044 | Enable software pipelining of innermost loops during selective scheduling. | |
1045 | This option has no effect unless one of :option:`-fselective-scheduling` or | |
1046 | :option:`-fselective-scheduling2` is turned on. | |
1047 | ||
1048 | .. option:: -fsel-sched-pipelining-outer-loops | |
1049 | ||
1050 | When pipelining loops during selective scheduling, also pipeline outer loops. | |
1051 | This option has no effect unless :option:`-fsel-sched-pipelining` is turned on. | |
1052 | ||
1053 | .. option:: -fsemantic-interposition | |
1054 | ||
1055 | Some object formats, like ELF, allow interposing of symbols by the | |
1056 | dynamic linker. | |
1057 | This means that for symbols exported from the DSO, the compiler cannot perform | |
1058 | interprocedural propagation, inlining and other optimizations in anticipation | |
1059 | that the function or variable in question may change. While this feature is | |
1060 | useful, for example, to rewrite memory allocation functions by a debugging | |
1061 | implementation, it is expensive in the terms of code quality. | |
1062 | With :option:`-fno-semantic-interposition` the compiler assumes that | |
1063 | if interposition happens for functions the overwriting function will have | |
1064 | precisely the same semantics (and side effects). | |
1065 | Similarly if interposition happens | |
1066 | for variables, the constructor of the variable will be the same. The flag | |
1067 | has no effect for functions explicitly declared inline | |
1068 | (where it is never allowed for interposition to change semantics) | |
1069 | and for symbols explicitly declared weak. | |
1070 | ||
1071 | .. option:: -fshrink-wrap | |
1072 | ||
1073 | Emit function prologues only before parts of the function that need it, | |
1074 | rather than at the top of the function. This flag is enabled by default at | |
1075 | :option:`-O` and higher. | |
1076 | ||
1077 | .. option:: -fshrink-wrap-separate | |
1078 | ||
1079 | Shrink-wrap separate parts of the prologue and epilogue separately, so that | |
1080 | those parts are only executed when needed. | |
1081 | This option is on by default, but has no effect unless :option:`-fshrink-wrap` | |
1082 | is also turned on and the target supports this. | |
1083 | ||
1084 | .. option:: -fcaller-saves | |
1085 | ||
1086 | Enable allocation of values to registers that are clobbered by | |
1087 | function calls, by emitting extra instructions to save and restore the | |
1088 | registers around such calls. Such allocation is done only when it | |
1089 | seems to result in better code. | |
1090 | ||
1091 | This option is always enabled by default on certain machines, usually | |
1092 | those which have no call-preserved registers to use instead. | |
1093 | ||
1094 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
1095 | ||
1096 | .. option:: -fcombine-stack-adjustments | |
1097 | ||
1098 | Tracks stack adjustments (pushes and pops) and stack memory references | |
1099 | and then tries to find ways to combine them. | |
1100 | ||
1101 | Enabled by default at :option:`-O1` and higher. | |
1102 | ||
1103 | .. option:: -fipa-ra | |
1104 | ||
1105 | Use caller save registers for allocation if those registers are not used by | |
1106 | any called function. In that case it is not necessary to save and restore | |
1107 | them around calls. This is only possible if called functions are part of | |
1108 | same compilation unit as current function and they are compiled before it. | |
1109 | ||
1110 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`, however the option | |
1111 | is disabled if generated code will be instrumented for profiling | |
1112 | (:option:`-p`, or :option:`-pg`) or if callee's register usage cannot be known | |
1113 | exactly (this happens on targets that do not expose prologues | |
1114 | and epilogues in RTL). | |
1115 | ||
1116 | .. option:: -fconserve-stack | |
1117 | ||
1118 | Attempt to minimize stack usage. The compiler attempts to use less | |
1119 | stack space, even if that makes the program slower. This option | |
1120 | implies setting the large-stack-frame parameter to 100 | |
1121 | and the large-stack-frame-growth parameter to 400. | |
1122 | ||
1123 | .. option:: -ftree-reassoc | |
1124 | ||
1125 | Perform reassociation on trees. This flag is enabled by default | |
1126 | at :option:`-O1` and higher. | |
1127 | ||
1128 | .. option:: -fcode-hoisting | |
1129 | ||
1130 | Perform code hoisting. Code hoisting tries to move the | |
1131 | evaluation of expressions executed on all paths to the function exit | |
1132 | as early as possible. This is especially useful as a code size | |
1133 | optimization, but it often helps for code speed as well. | |
1134 | This flag is enabled by default at :option:`-O2` and higher. | |
1135 | ||
1136 | .. option:: -ftree-pre | |
1137 | ||
1138 | Perform partial redundancy elimination (PRE) on trees. This flag is | |
1139 | enabled by default at :option:`-O2` and :option:`-O3`. | |
1140 | ||
1141 | .. option:: -ftree-partial-pre | |
1142 | ||
1143 | Make partial redundancy elimination (PRE) more aggressive. This flag is | |
1144 | enabled by default at :option:`-O3`. | |
1145 | ||
1146 | .. option:: -ftree-forwprop | |
1147 | ||
1148 | Perform forward propagation on trees. This flag is enabled by default | |
1149 | at :option:`-O1` and higher. | |
1150 | ||
1151 | .. option:: -ftree-fre | |
1152 | ||
1153 | Perform full redundancy elimination (FRE) on trees. The difference | |
1154 | between FRE and PRE is that FRE only considers expressions | |
1155 | that are computed on all paths leading to the redundant computation. | |
1156 | This analysis is faster than PRE, though it exposes fewer redundancies. | |
1157 | This flag is enabled by default at :option:`-O1` and higher. | |
1158 | ||
1159 | .. option:: -ftree-phiprop | |
1160 | ||
1161 | Perform hoisting of loads from conditional pointers on trees. This | |
1162 | pass is enabled by default at :option:`-O1` and higher. | |
1163 | ||
1164 | .. option:: -fhoist-adjacent-loads | |
1165 | ||
1166 | Speculatively hoist loads from both branches of an if-then-else if the | |
1167 | loads are from adjacent locations in the same structure and the target | |
1168 | architecture has a conditional move instruction. This flag is enabled | |
1169 | by default at :option:`-O2` and higher. | |
1170 | ||
1171 | .. option:: -ftree-copy-prop | |
1172 | ||
1173 | Perform copy propagation on trees. This pass eliminates unnecessary | |
1174 | copy operations. This flag is enabled by default at :option:`-O1` and | |
1175 | higher. | |
1176 | ||
1177 | .. option:: -fipa-pure-const | |
1178 | ||
1179 | Discover which functions are pure or constant. | |
1180 | Enabled by default at :option:`-O1` and higher. | |
1181 | ||
1182 | .. option:: -fipa-reference | |
1183 | ||
1184 | Discover which static variables do not escape the | |
1185 | compilation unit. | |
1186 | Enabled by default at :option:`-O1` and higher. | |
1187 | ||
1188 | .. option:: -fipa-reference-addressable | |
1189 | ||
1190 | Discover read-only, write-only and non-addressable static variables. | |
1191 | Enabled by default at :option:`-O1` and higher. | |
1192 | ||
1193 | .. option:: -fipa-stack-alignment | |
1194 | ||
1195 | Reduce stack alignment on call sites if possible. | |
1196 | Enabled by default. | |
1197 | ||
1198 | .. option:: -fipa-pta | |
1199 | ||
1200 | Perform interprocedural pointer analysis and interprocedural modification | |
1201 | and reference analysis. This option can cause excessive memory and | |
1202 | compile-time usage on large compilation units. It is not enabled by | |
1203 | default at any optimization level. | |
1204 | ||
1205 | .. option:: -fipa-profile | |
1206 | ||
1207 | Perform interprocedural profile propagation. The functions called only from | |
1208 | cold functions are marked as cold. Also functions executed once (such as | |
1209 | :fn-attr:`cold`, :fn-attr:`noreturn`, static constructors or destructors) are | |
1210 | identified. Cold functions and loop less parts of functions executed once are | |
1211 | then optimized for size. | |
1212 | Enabled by default at :option:`-O1` and higher. | |
1213 | ||
1214 | .. option:: -fipa-modref | |
1215 | ||
1216 | Perform interprocedural mod/ref analysis. This optimization analyzes the side | |
1217 | effects of functions (memory locations that are modified or referenced) and | |
1218 | enables better optimization across the function call boundary. This flag is | |
1219 | enabled by default at :option:`-O1` and higher. | |
1220 | ||
1221 | .. option:: -fipa-cp | |
1222 | ||
1223 | Perform interprocedural constant propagation. | |
1224 | This optimization analyzes the program to determine when values passed | |
1225 | to functions are constants and then optimizes accordingly. | |
1226 | This optimization can substantially increase performance | |
1227 | if the application has constants passed to functions. | |
1228 | This flag is enabled by default at :option:`-O2`, :option:`-Os` and :option:`-O3`. | |
1229 | It is also enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
1230 | ||
1231 | .. option:: -fipa-cp-clone | |
1232 | ||
1233 | Perform function cloning to make interprocedural constant propagation stronger. | |
1234 | When enabled, interprocedural constant propagation performs function cloning | |
1235 | when externally visible function can be called with constant arguments. | |
1236 | Because this optimization can create multiple copies of functions, | |
1237 | it may significantly increase code size | |
1238 | (see :option:`--param ipa-cp-unit-growth=value`). | |
1239 | This flag is enabled by default at :option:`-O3`. | |
1240 | It is also enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
1241 | ||
1242 | .. option:: -fipa-bit-cp | |
1243 | ||
1244 | When enabled, perform interprocedural bitwise constant | |
1245 | propagation. This flag is enabled by default at :option:`-O2` and | |
1246 | by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
1247 | It requires that :option:`-fipa-cp` is enabled. | |
1248 | ||
1249 | .. option:: -fipa-vrp | |
1250 | ||
1251 | When enabled, perform interprocedural propagation of value | |
1252 | ranges. This flag is enabled by default at :option:`-O2`. It requires | |
1253 | that :option:`-fipa-cp` is enabled. | |
1254 | ||
1255 | .. option:: -fipa-icf | |
1256 | ||
1257 | Perform Identical Code Folding for functions and read-only variables. | |
1258 | The optimization reduces code size and may disturb unwind stacks by replacing | |
1259 | a function by equivalent one with a different name. The optimization works | |
1260 | more effectively with link-time optimization enabled. | |
1261 | ||
1262 | Although the behavior is similar to the Gold Linker's ICF optimization, GCC ICF | |
1263 | works on different levels and thus the optimizations are not same - there are | |
1264 | equivalences that are found only by GCC and equivalences found only by Gold. | |
1265 | ||
1266 | This flag is enabled by default at :option:`-O2` and :option:`-Os`. | |
1267 | ||
1268 | .. option:: -flive-patching={level} | |
1269 | ||
1270 | Control GCC's optimizations to produce output suitable for live-patching. | |
1271 | ||
1272 | If the compiler's optimization uses a function's body or information extracted | |
1273 | from its body to optimize/change another function, the latter is called an | |
1274 | impacted function of the former. If a function is patched, its impacted | |
1275 | functions should be patched too. | |
1276 | ||
1277 | The impacted functions are determined by the compiler's interprocedural | |
1278 | optimizations. For example, a caller is impacted when inlining a function | |
1279 | into its caller, | |
1280 | cloning a function and changing its caller to call this new clone, | |
1281 | or extracting a function's pureness/constness information to optimize | |
1282 | its direct or indirect callers, etc. | |
1283 | ||
1284 | Usually, the more IPA optimizations enabled, the larger the number of | |
1285 | impacted functions for each function. In order to control the number of | |
1286 | impacted functions and more easily compute the list of impacted function, | |
1287 | IPA optimizations can be partially enabled at two different levels. | |
1288 | ||
1289 | The :samp:`{level}` argument should be one of the following: | |
1290 | ||
1291 | :samp:`inline-clone` | |
1292 | Only enable inlining and cloning optimizations, which includes inlining, | |
1293 | cloning, interprocedural scalar replacement of aggregates and partial inlining. | |
1294 | As a result, when patching a function, all its callers and its clones' | |
1295 | callers are impacted, therefore need to be patched as well. | |
1296 | ||
1297 | :option:`-flive-patching=inline-clone` disables the following optimization flags: | |
1298 | ||
1299 | :option:`-fwhole-program` :option:`-fipa-pta` :option:`-fipa-reference` :option:`-fipa-ra` |gol| | |
1300 | :option:`-fipa-icf` :option:`-fipa-icf-functions` :option:`-fipa-icf-variables` |gol| | |
1301 | :option:`-fipa-bit-cp` :option:`-fipa-vrp` :option:`-fipa-pure-const` :option:`-fipa-reference-addressable` |gol| | |
1302 | :option:`-fipa-stack-alignment` :option:`-fipa-modref` | |
1303 | ||
1304 | :samp:`inline-only-static` | |
1305 | Only enable inlining of static functions. | |
1306 | As a result, when patching a static function, all its callers are impacted | |
1307 | and so need to be patched as well. | |
1308 | ||
1309 | In addition to all the flags that :option:`-flive-patching=inline-clone` | |
1310 | disables, | |
1311 | :option:`-flive-patching=inline-only-static` disables the following additional | |
1312 | optimization flags: | |
1313 | ||
1314 | :option:`-fipa-cp-clone` :option:`-fipa-sra` :option:`-fpartial-inlining` :option:`-fipa-cp` | |
1315 | ||
1316 | When :option:`-flive-patching` is specified without any value, the default value | |
1317 | is :samp:`{inline-clone}`. | |
1318 | ||
1319 | This flag is disabled by default. | |
1320 | ||
1321 | Note that :option:`-flive-patching` is not supported with link-time optimization | |
1322 | (:option:`-flto`). | |
1323 | ||
1324 | .. option:: -fisolate-erroneous-paths-dereference | |
1325 | ||
1326 | Detect paths that trigger erroneous or undefined behavior due to | |
1327 | dereferencing a null pointer. Isolate those paths from the main control | |
1328 | flow and turn the statement with erroneous or undefined behavior into a trap. | |
1329 | This flag is enabled by default at :option:`-O2` and higher and depends on | |
1330 | :option:`-fdelete-null-pointer-checks` also being enabled. | |
1331 | ||
1332 | .. option:: -fisolate-erroneous-paths-attribute | |
1333 | ||
1334 | Detect paths that trigger erroneous or undefined behavior due to a null value | |
1335 | being used in a way forbidden by a :fn-attr:`returns_nonnull` or :fn-attr:`nonnull` | |
1336 | attribute. Isolate those paths from the main control flow and turn the | |
1337 | statement with erroneous or undefined behavior into a trap. This is not | |
1338 | currently enabled, but may be enabled by :option:`-O2` in the future. | |
1339 | ||
1340 | .. option:: -ftree-sink | |
1341 | ||
1342 | Perform forward store motion on trees. This flag is | |
1343 | enabled by default at :option:`-O1` and higher. | |
1344 | ||
1345 | .. option:: -ftree-bit-ccp | |
1346 | ||
1347 | Perform sparse conditional bit constant propagation on trees and propagate | |
1348 | pointer alignment information. | |
1349 | This pass only operates on local scalar variables and is enabled by default | |
1350 | at :option:`-O1` and higher, except for :option:`-Og`. | |
1351 | It requires that :option:`-ftree-ccp` is enabled. | |
1352 | ||
1353 | .. option:: -ftree-ccp | |
1354 | ||
1355 | Perform sparse conditional constant propagation (CCP) on trees. This | |
1356 | pass only operates on local scalar variables and is enabled by default | |
1357 | at :option:`-O1` and higher. | |
1358 | ||
1359 | .. option:: -fssa-backprop | |
1360 | ||
1361 | Propagate information about uses of a value up the definition chain | |
1362 | in order to simplify the definitions. For example, this pass strips | |
1363 | sign operations if the sign of a value never matters. The flag is | |
1364 | enabled by default at :option:`-O1` and higher. | |
1365 | ||
1366 | .. option:: -fssa-phiopt | |
1367 | ||
1368 | Perform pattern matching on SSA PHI nodes to optimize conditional | |
1369 | code. This pass is enabled by default at :option:`-O1` and higher, | |
1370 | except for :option:`-Og`. | |
1371 | ||
1372 | .. option:: -ftree-switch-conversion | |
1373 | ||
1374 | Perform conversion of simple initializations in a switch to | |
1375 | initializations from a scalar array. This flag is enabled by default | |
1376 | at :option:`-O2` and higher. | |
1377 | ||
1378 | .. option:: -ftree-tail-merge | |
1379 | ||
1380 | Look for identical code sequences. When found, replace one with a jump to the | |
1381 | other. This optimization is known as tail merging or cross jumping. This flag | |
1382 | is enabled by default at :option:`-O2` and higher. The compilation time | |
1383 | in this pass can | |
1384 | be limited using max-tail-merge-comparisons parameter and | |
1385 | max-tail-merge-iterations parameter. | |
1386 | ||
1387 | .. option:: -ftree-dce | |
1388 | ||
1389 | Perform dead code elimination (DCE) on trees. This flag is enabled by | |
1390 | default at :option:`-O1` and higher. | |
1391 | ||
1392 | .. option:: -ftree-builtin-call-dce | |
1393 | ||
1394 | Perform conditional dead code elimination (DCE) for calls to built-in functions | |
1395 | that may set ``errno`` but are otherwise free of side effects. This flag is | |
1396 | enabled by default at :option:`-O2` and higher if :option:`-Os` is not also | |
1397 | specified. | |
1398 | ||
1399 | .. option:: -ffinite-loops | |
1400 | ||
1401 | Assume that a loop with an exit will eventually take the exit and not loop | |
1402 | indefinitely. This allows the compiler to remove loops that otherwise have | |
1403 | no side-effects, not considering eventual endless looping as such. | |
1404 | ||
1405 | This option is enabled by default at :option:`-O2` for C++ with -std=c++11 | |
1406 | or higher. | |
1407 | ||
1408 | .. option:: -fno-finite-loops | |
1409 | ||
1410 | Default setting; overrides :option:`-ffinite-loops`. | |
1411 | ||
1412 | .. option:: -ftree-dominator-opts | |
1413 | ||
1414 | Perform a variety of simple scalar cleanups (constant/copy | |
1415 | propagation, redundancy elimination, range propagation and expression | |
1416 | simplification) based on a dominator tree traversal. This also | |
1417 | performs jump threading (to reduce jumps to jumps). This flag is | |
1418 | enabled by default at :option:`-O1` and higher. | |
1419 | ||
1420 | .. option:: -ftree-dse | |
1421 | ||
1422 | Perform dead store elimination (DSE) on trees. A dead store is a store into | |
1423 | a memory location that is later overwritten by another store without | |
1424 | any intervening loads. In this case the earlier store can be deleted. This | |
1425 | flag is enabled by default at :option:`-O1` and higher. | |
1426 | ||
1427 | .. option:: -ftree-ch | |
1428 | ||
1429 | Perform loop header copying on trees. This is beneficial since it increases | |
1430 | effectiveness of code motion optimizations. It also saves one jump. This flag | |
1431 | is enabled by default at :option:`-O1` and higher. It is not enabled | |
1432 | for :option:`-Os`, since it usually increases code size. | |
1433 | ||
1434 | .. option:: -ftree-loop-optimize | |
1435 | ||
1436 | Perform loop optimizations on trees. This flag is enabled by default | |
1437 | at :option:`-O1` and higher. | |
1438 | ||
1439 | .. option:: -ftree-loop-linear, -floop-strip-mine, -floop-block | |
1440 | ||
1441 | Perform loop nest optimizations. Same as | |
1442 | :option:`-floop-nest-optimize`. To use this code transformation, GCC has | |
1443 | to be configured with :option:`--with-isl` to enable the Graphite loop | |
1444 | transformation infrastructure. | |
1445 | ||
1446 | .. option:: -fgraphite-identity | |
1447 | ||
1448 | Enable the identity transformation for graphite. For every SCoP we generate | |
1449 | the polyhedral representation and transform it back to gimple. Using | |
1450 | :option:`-fgraphite-identity` we can check the costs or benefits of the | |
1451 | GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations | |
1452 | are also performed by the code generator isl, like index splitting and | |
1453 | dead code elimination in loops. | |
1454 | ||
1455 | .. option:: -floop-nest-optimize | |
1456 | ||
1457 | Enable the isl based loop nest optimizer. This is a generic loop nest | |
1458 | optimizer based on the Pluto optimization algorithms. It calculates a loop | |
1459 | structure optimized for data-locality and parallelism. This option | |
1460 | is experimental. | |
1461 | ||
1462 | .. option:: -floop-parallelize-all | |
1463 | ||
1464 | Use the Graphite data dependence analysis to identify loops that can | |
1465 | be parallelized. Parallelize all the loops that can be analyzed to | |
1466 | not contain loop carried dependences without checking that it is | |
1467 | profitable to parallelize the loops. | |
1468 | ||
1469 | .. option:: -ftree-coalesce-vars | |
1470 | ||
1471 | While transforming the program out of the SSA representation, attempt to | |
1472 | reduce copying by coalescing versions of different user-defined | |
1473 | variables, instead of just compiler temporaries. This may severely | |
1474 | limit the ability to debug an optimized program compiled with | |
1475 | :option:`-fno-var-tracking-assignments`. In the negated form, this flag | |
1476 | prevents SSA coalescing of user variables. This option is enabled by | |
1477 | default if optimization is enabled, and it does very little otherwise. | |
1478 | ||
1479 | .. option:: -ftree-loop-if-convert | |
1480 | ||
1481 | Attempt to transform conditional jumps in the innermost loops to | |
1482 | branch-less equivalents. The intent is to remove control-flow from | |
1483 | the innermost loops in order to improve the ability of the | |
1484 | vectorization pass to handle these loops. This is enabled by default | |
1485 | if vectorization is enabled. | |
1486 | ||
1487 | .. option:: -ftree-loop-distribution | |
1488 | ||
1489 | Perform loop distribution. This flag can improve cache performance on | |
1490 | big loop bodies and allow further loop optimizations, like | |
1491 | parallelization or vectorization, to take place. For example, the loop | |
1492 | ||
1493 | .. code-block:: fortran | |
1494 | ||
1495 | DO I = 1, N | |
1496 | A(I) = B(I) + C | |
1497 | D(I) = E(I) * F | |
1498 | ENDDO | |
1499 | ||
1500 | is transformed to | |
1501 | ||
1502 | .. code-block:: fortran | |
1503 | ||
1504 | DO I = 1, N | |
1505 | A(I) = B(I) + C | |
1506 | ENDDO | |
1507 | DO I = 1, N | |
1508 | D(I) = E(I) * F | |
1509 | ENDDO | |
1510 | ||
1511 | This flag is enabled by default at :option:`-O3`. | |
1512 | It is also enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
1513 | ||
1514 | .. option:: -ftree-loop-distribute-patterns | |
1515 | ||
1516 | Perform loop distribution of patterns that can be code generated with | |
1517 | calls to a library. This flag is enabled by default at :option:`-O2` and | |
1518 | higher, and by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
1519 | ||
1520 | This pass distributes the initialization loops and generates a call to | |
1521 | memset zero. For example, the loop | |
1522 | ||
1523 | .. code-block:: fortran | |
1524 | ||
1525 | DO I = 1, N | |
1526 | A(I) = 0 | |
1527 | B(I) = A(I) + I | |
1528 | ENDDO | |
1529 | ||
1530 | is transformed to | |
1531 | ||
1532 | .. code-block:: fortran | |
1533 | ||
1534 | DO I = 1, N | |
1535 | A(I) = 0 | |
1536 | ENDDO | |
1537 | DO I = 1, N | |
1538 | B(I) = A(I) + I | |
1539 | ENDDO | |
1540 | ||
1541 | and the initialization loop is transformed into a call to memset zero. | |
1542 | This flag is enabled by default at :option:`-O3`. | |
1543 | It is also enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
1544 | ||
1545 | .. option:: -floop-interchange | |
1546 | ||
1547 | Perform loop interchange outside of graphite. This flag can improve cache | |
1548 | performance on loop nest and allow further loop optimizations, like | |
1549 | vectorization, to take place. For example, the loop | |
1550 | ||
1551 | .. code-block:: c++ | |
1552 | ||
1553 | for (int i = 0; i < N; i++) | |
1554 | for (int j = 0; j < N; j++) | |
1555 | for (int k = 0; k < N; k++) | |
1556 | c[i][j] = c[i][j] + a[i][k]*b[k][j]; | |
1557 | ||
1558 | is transformed to | |
1559 | ||
1560 | .. code-block:: c++ | |
1561 | ||
1562 | for (int i = 0; i < N; i++) | |
1563 | for (int k = 0; k < N; k++) | |
1564 | for (int j = 0; j < N; j++) | |
1565 | c[i][j] = c[i][j] + a[i][k]*b[k][j]; | |
1566 | ||
1567 | This flag is enabled by default at :option:`-O3`. | |
1568 | It is also enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
1569 | ||
1570 | .. option:: -floop-unroll-and-jam | |
1571 | ||
1572 | Apply unroll and jam transformations on feasible loops. In a loop | |
1573 | nest this unrolls the outer loop by some factor and fuses the resulting | |
1574 | multiple inner loops. This flag is enabled by default at :option:`-O3`. | |
1575 | It is also enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
1576 | ||
1577 | .. option:: -ftree-loop-im | |
1578 | ||
1579 | Perform loop invariant motion on trees. This pass moves only invariants that | |
1580 | are hard to handle at RTL level (function calls, operations that expand to | |
1581 | nontrivial sequences of insns). With :option:`-funswitch-loops` it also moves | |
1582 | operands of conditions that are invariant out of the loop, so that we can use | |
1583 | just trivial invariantness analysis in loop unswitching. The pass also includes | |
1584 | store motion. | |
1585 | ||
1586 | .. option:: -ftree-loop-ivcanon | |
1587 | ||
1588 | Create a canonical counter for number of iterations in loops for which | |
1589 | determining number of iterations requires complicated analysis. Later | |
1590 | optimizations then may determine the number easily. Useful especially | |
1591 | in connection with unrolling. | |
1592 | ||
1593 | .. option:: -ftree-scev-cprop | |
1594 | ||
1595 | Perform final value replacement. If a variable is modified in a loop | |
1596 | in such a way that its value when exiting the loop can be determined using | |
1597 | only its initial value and the number of loop iterations, replace uses of | |
1598 | the final value by such a computation, provided it is sufficiently cheap. | |
1599 | This reduces data dependencies and may allow further simplifications. | |
1600 | Enabled by default at :option:`-O1` and higher. | |
1601 | ||
1602 | .. option:: -fivopts | |
1603 | ||
1604 | Perform induction variable optimizations (strength reduction, induction | |
1605 | variable merging and induction variable elimination) on trees. | |
1606 | ||
1607 | .. option:: -ftree-parallelize-loops=n | |
1608 | ||
1609 | Parallelize loops, i.e., split their iteration space to run in n threads. | |
1610 | This is only possible for loops whose iterations are independent | |
1611 | and can be arbitrarily reordered. The optimization is only | |
1612 | profitable on multiprocessor machines, for loops that are CPU-intensive, | |
1613 | rather than constrained e.g. by memory bandwidth. This option | |
1614 | implies :option:`-pthread`, and thus is only supported on targets | |
1615 | that have support for :option:`-pthread`. | |
1616 | ||
1617 | .. option:: -ftree-pta | |
1618 | ||
1619 | Perform function-local points-to analysis on trees. This flag is | |
1620 | enabled by default at :option:`-O1` and higher, except for :option:`-Og`. | |
1621 | ||
1622 | .. option:: -ftree-sra | |
1623 | ||
1624 | Perform scalar replacement of aggregates. This pass replaces structure | |
1625 | references with scalars to prevent committing structures to memory too | |
1626 | early. This flag is enabled by default at :option:`-O1` and higher, | |
1627 | except for :option:`-Og`. | |
1628 | ||
1629 | .. option:: -fstore-merging | |
1630 | ||
1631 | Perform merging of narrow stores to consecutive memory addresses. This pass | |
1632 | merges contiguous stores of immediate values narrower than a word into fewer | |
1633 | wider stores to reduce the number of instructions. This is enabled by default | |
1634 | at :option:`-O2` and higher as well as :option:`-Os`. | |
1635 | ||
1636 | .. option:: -ftree-ter | |
1637 | ||
1638 | Perform temporary expression replacement during the SSA->normal phase. Single | |
1639 | use/single def temporaries are replaced at their use location with their | |
1640 | defining expression. This results in non-GIMPLE code, but gives the expanders | |
1641 | much more complex trees to work on resulting in better RTL generation. This is | |
1642 | enabled by default at :option:`-O1` and higher. | |
1643 | ||
1644 | .. option:: -ftree-slsr | |
1645 | ||
1646 | Perform straight-line strength reduction on trees. This recognizes related | |
1647 | expressions involving multiplications and replaces them by less expensive | |
1648 | calculations when possible. This is enabled by default at :option:`-O1` and | |
1649 | higher. | |
1650 | ||
1651 | .. option:: -ftree-vectorize | |
1652 | ||
1653 | Perform vectorization on trees. This flag enables :option:`-ftree-loop-vectorize` | |
1654 | and :option:`-ftree-slp-vectorize` if not explicitly specified. | |
1655 | ||
1656 | .. option:: -ftree-loop-vectorize | |
1657 | ||
1658 | Perform loop vectorization on trees. This flag is enabled by default at | |
1659 | :option:`-O2` and by :option:`-ftree-vectorize`, :option:`-fprofile-use`, | |
1660 | and :option:`-fauto-profile`. | |
1661 | ||
1662 | .. option:: -ftree-slp-vectorize | |
1663 | ||
1664 | Perform basic block vectorization on trees. This flag is enabled by default at | |
1665 | :option:`-O2` and by :option:`-ftree-vectorize`, :option:`-fprofile-use`, | |
1666 | and :option:`-fauto-profile`. | |
1667 | ||
1668 | .. option:: -ftrivial-auto-var-init={choice} | |
1669 | ||
1670 | Initialize automatic variables with either a pattern or with zeroes to increase | |
1671 | the security and predictability of a program by preventing uninitialized memory | |
1672 | disclosure and use. | |
1673 | GCC still considers an automatic variable that doesn't have an explicit | |
1674 | initializer as uninitialized, :option:`-Wuninitialized` and | |
1675 | :option:`-Wanalyzer-use-of-uninitialized-value` will still report | |
1676 | warning messages on such automatic variables. | |
1677 | With this option, GCC will also initialize any padding of automatic variables | |
1678 | that have structure or union types to zeroes. | |
1679 | However, the current implementation cannot initialize automatic variables that | |
1680 | are declared between the controlling expression and the first case of a | |
1681 | ``switch`` statement. Using :option:`-Wtrivial-auto-var-init` to report all | |
1682 | such cases. | |
1683 | ||
1684 | The three values of :samp:`{choice}` are: | |
1685 | ||
1686 | * :samp:`uninitialized` doesn't initialize any automatic variables. | |
1687 | This is C and C++'s default. | |
1688 | ||
1689 | * :samp:`pattern` Initialize automatic variables with values which will likely | |
1690 | transform logic bugs into crashes down the line, are easily recognized in a | |
1691 | crash dump and without being values that programmers can rely on for useful | |
1692 | program semantics. | |
1693 | The current value is byte-repeatable pattern with byte "0xFE". | |
1694 | The values used for pattern initialization might be changed in the future. | |
1695 | ||
1696 | * :samp:`zero` Initialize automatic variables with zeroes. | |
1697 | ||
1698 | The default is :samp:`uninitialized`. | |
1699 | ||
1700 | You can control this behavior for a specific variable by using the variable | |
1701 | attribute :var-attr:`uninitialized` (see :ref:`variable-attributes`). | |
1702 | ||
1703 | .. option:: -fvect-cost-model={model} | |
1704 | ||
1705 | Alter the cost model used for vectorization. The :samp:`{model}` argument | |
1706 | should be one of :samp:`unlimited`, :samp:`dynamic`, :samp:`cheap` or | |
1707 | :samp:`very-cheap`. | |
1708 | With the :samp:`unlimited` model the vectorized code-path is assumed | |
1709 | to be profitable while with the :samp:`dynamic` model a runtime check | |
1710 | guards the vectorized code-path to enable it only for iteration | |
1711 | counts that will likely execute faster than when executing the original | |
1712 | scalar loop. The :samp:`cheap` model disables vectorization of | |
1713 | loops where doing so would be cost prohibitive for example due to | |
1714 | required runtime checks for data dependence or alignment but otherwise | |
1715 | is equal to the :samp:`dynamic` model. The :samp:`very-cheap` model only | |
1716 | allows vectorization if the vector code would entirely replace the | |
1717 | scalar code that is being vectorized. For example, if each iteration | |
1718 | of a vectorized loop would only be able to handle exactly four iterations | |
1719 | of the scalar loop, the :samp:`very-cheap` model would only allow | |
1720 | vectorization if the scalar iteration count is known to be a multiple | |
1721 | of four. | |
1722 | ||
1723 | The default cost model depends on other optimization flags and is | |
1724 | either :samp:`dynamic` or :samp:`cheap`. | |
1725 | ||
1726 | .. option:: -fsimd-cost-model={model} | |
1727 | ||
1728 | Alter the cost model used for vectorization of loops marked with the OpenMP | |
1729 | simd directive. The :samp:`{model}` argument should be one of | |
1730 | :samp:`unlimited`, :samp:`dynamic`, :samp:`cheap`. All values of :samp:`{model}` | |
1731 | have the same meaning as described in :option:`-fvect-cost-model` and by | |
1732 | default a cost model defined with :option:`-fvect-cost-model` is used. | |
1733 | ||
1734 | .. option:: -ftree-vrp | |
1735 | ||
1736 | Perform Value Range Propagation on trees. This is similar to the | |
1737 | constant propagation pass, but instead of values, ranges of values are | |
1738 | propagated. This allows the optimizers to remove unnecessary range | |
1739 | checks like array bound checks and null pointer checks. This is | |
1740 | enabled by default at :option:`-O2` and higher. Null pointer check | |
1741 | elimination is only done if :option:`-fdelete-null-pointer-checks` is | |
1742 | enabled. | |
1743 | ||
1744 | .. option:: -fsplit-paths | |
1745 | ||
1746 | Split paths leading to loop backedges. This can improve dead code | |
1747 | elimination and common subexpression elimination. This is enabled by | |
1748 | default at :option:`-O3` and above. | |
1749 | ||
1750 | .. option:: -fsplit-ivs-in-unroller | |
1751 | ||
1752 | Enables expression of values of induction variables in later iterations | |
1753 | of the unrolled loop using the value in the first iteration. This breaks | |
1754 | long dependency chains, thus improving efficiency of the scheduling passes. | |
1755 | ||
1756 | A combination of :option:`-fweb` and CSE is often sufficient to obtain the | |
1757 | same effect. However, that is not reliable in cases where the loop body | |
1758 | is more complicated than a single basic block. It also does not work at all | |
1759 | on some architectures due to restrictions in the CSE pass. | |
1760 | ||
1761 | This optimization is enabled by default. | |
1762 | ||
1763 | .. option:: -fvariable-expansion-in-unroller | |
1764 | ||
1765 | With this option, the compiler creates multiple copies of some | |
1766 | local variables when unrolling a loop, which can result in superior code. | |
1767 | ||
1768 | This optimization is enabled by default for PowerPC targets, but disabled | |
1769 | by default otherwise. | |
1770 | ||
1771 | .. option:: -fpartial-inlining | |
1772 | ||
1773 | Inline parts of functions. This option has any effect only | |
1774 | when inlining itself is turned on by the :option:`-finline-functions` | |
1775 | or :option:`-finline-small-functions` options. | |
1776 | ||
1777 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
1778 | ||
1779 | .. option:: -fpredictive-commoning | |
1780 | ||
1781 | Perform predictive commoning optimization, i.e., reusing computations | |
1782 | (especially memory loads and stores) performed in previous | |
1783 | iterations of loops. | |
1784 | ||
1785 | This option is enabled at level :option:`-O3`. | |
1786 | It is also enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
1787 | ||
1788 | .. option:: -fprefetch-loop-arrays | |
1789 | ||
1790 | If supported by the target machine, generate instructions to prefetch | |
1791 | memory to improve the performance of loops that access large arrays. | |
1792 | ||
1793 | This option may generate better or worse code; results are highly | |
1794 | dependent on the structure of loops within the source code. | |
1795 | ||
1796 | Disabled at level :option:`-Os`. | |
1797 | ||
1798 | .. option:: -fno-printf-return-value | |
1799 | ||
1800 | Do not substitute constants for known return value of formatted output | |
1801 | functions such as ``sprintf``, ``snprintf``, ``vsprintf``, and | |
1802 | ``vsnprintf`` (but not ``printf`` of ``fprintf``). This | |
1803 | transformation allows GCC to optimize or even eliminate branches based | |
1804 | on the known return value of these functions called with arguments that | |
1805 | are either constant, or whose values are known to be in a range that | |
1806 | makes determining the exact return value possible. For example, when | |
1807 | :option:`-fprintf-return-value` is in effect, both the branch and the | |
1808 | body of the ``if`` statement (but not the call to ``snprint``) | |
1809 | can be optimized away when ``i`` is a 32-bit or smaller integer | |
1810 | because the return value is guaranteed to be at most 8. | |
1811 | ||
1812 | .. code-block:: c++ | |
1813 | ||
1814 | char buf[9]; | |
1815 | if (snprintf (buf, "%08x", i) >= sizeof buf) | |
1816 | ... | |
1817 | ||
1818 | The :option:`-fprintf-return-value` option relies on other optimizations | |
1819 | and yields best results with :option:`-O2` and above. It works in tandem | |
1820 | with the :option:`-Wformat-overflow` and :option:`-Wformat-truncation` | |
1821 | options. The :option:`-fprintf-return-value` option is enabled by default. | |
1822 | ||
1823 | .. option:: -fprintf-return-value | |
1824 | ||
1825 | Default setting; overrides :option:`-fno-printf-return-value`. | |
1826 | ||
1827 | .. option:: -fno-peephole, -fno-peephole2, -fpeephole, -fpeephole2 | |
1828 | ||
1829 | Disable any machine-specific peephole optimizations. The difference | |
1830 | between :option:`-fno-peephole` and :option:`-fno-peephole2` is in how they | |
1831 | are implemented in the compiler; some targets use one, some use the | |
1832 | other, a few use both. | |
1833 | ||
1834 | :option:`-fpeephole` is enabled by default. | |
1835 | :option:`-fpeephole2` enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
1836 | ||
1837 | .. option:: -fno-guess-branch-probability | |
1838 | ||
1839 | Do not guess branch probabilities using heuristics. | |
1840 | ||
1841 | GCC uses heuristics to guess branch probabilities if they are | |
1842 | not provided by profiling feedback (:option:`-fprofile-arcs`). These | |
1843 | heuristics are based on the control flow graph. If some branch probabilities | |
1844 | are specified by ``__builtin_expect``, then the heuristics are | |
1845 | used to guess branch probabilities for the rest of the control flow graph, | |
1846 | taking the ``__builtin_expect`` info into account. The interactions | |
1847 | between the heuristics and ``__builtin_expect`` can be complex, and in | |
1848 | some cases, it may be useful to disable the heuristics so that the effects | |
1849 | of ``__builtin_expect`` are easier to understand. | |
1850 | ||
1851 | It is also possible to specify expected probability of the expression | |
1852 | with ``__builtin_expect_with_probability`` built-in function. | |
1853 | ||
1854 | The default is :option:`-fguess-branch-probability` at levels | |
1855 | :option:`-O`, :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
1856 | ||
1857 | .. option:: -fguess-branch-probability | |
1858 | ||
1859 | Default setting; overrides :option:`-fno-guess-branch-probability`. | |
1860 | ||
1861 | .. option:: -freorder-blocks | |
1862 | ||
1863 | Reorder basic blocks in the compiled function in order to reduce number of | |
1864 | taken branches and improve code locality. | |
1865 | ||
1866 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
1867 | ||
1868 | .. option:: -freorder-blocks-algorithm={algorithm} | |
1869 | ||
1870 | Use the specified algorithm for basic block reordering. The | |
1871 | :samp:`{algorithm}` argument can be :samp:`simple`, which does not increase | |
1872 | code size (except sometimes due to secondary effects like alignment), | |
1873 | or :samp:`stc`, the 'software trace cache' algorithm, which tries to | |
1874 | put all often executed code together, minimizing the number of branches | |
1875 | executed by making extra copies of code. | |
1876 | ||
1877 | The default is :samp:`simple` at levels :option:`-O1`, :option:`-Os`, and | |
1878 | :samp:`stc` at levels :option:`-O2`, :option:`-O3`. | |
1879 | ||
1880 | .. option:: -freorder-blocks-and-partition | |
1881 | ||
1882 | In addition to reordering basic blocks in the compiled function, in order | |
1883 | to reduce number of taken branches, partitions hot and cold basic blocks | |
1884 | into separate sections of the assembly and :samp:`.o` files, to improve | |
1885 | paging and cache locality performance. | |
1886 | ||
1887 | This optimization is automatically turned off in the presence of | |
1888 | exception handling or unwind tables (on targets using setjump/longjump or target specific scheme), for linkonce sections, for functions with a user-defined | |
1889 | section attribute and on any architecture that does not support named | |
1890 | sections. When :option:`-fsplit-stack` is used this option is not | |
1891 | enabled by default (to avoid linker errors), but may be enabled | |
1892 | explicitly (if using a working linker). | |
1893 | ||
1894 | Enabled for x86 at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
1895 | ||
1896 | .. option:: -freorder-functions | |
1897 | ||
1898 | Reorder functions in the object file in order to | |
1899 | improve code locality. This is implemented by using special | |
1900 | subsections ``.text.hot`` for most frequently executed functions and | |
1901 | ``.text.unlikely`` for unlikely executed functions. Reordering is done by | |
1902 | the linker so object file format must support named sections and linker must | |
1903 | place them in a reasonable way. | |
1904 | ||
1905 | This option isn't effective unless you either provide profile feedback | |
1906 | (see :option:`-fprofile-arcs` for details) or manually annotate functions with | |
1907 | :fn-attr:`hot` or :fn-attr:`cold` attributes (see :ref:`common-function-attributes`). | |
1908 | ||
1909 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
1910 | ||
1911 | .. _type-punning: | |
1912 | ||
1913 | .. option:: -fstrict-aliasing | |
1914 | ||
1915 | Allow the compiler to assume the strictest aliasing rules applicable to | |
1916 | the language being compiled. For C (and C++), this activates | |
1917 | optimizations based on the type of expressions. In particular, an | |
1918 | object of one type is assumed never to reside at the same address as an | |
1919 | object of a different type, unless the types are almost the same. For | |
1920 | example, an ``unsigned int`` can alias an ``int``, but not a | |
1921 | ``void*`` or a ``double``. A character type may alias any other | |
1922 | type. | |
1923 | ||
1924 | Pay special attention to code like this: | |
1925 | ||
1926 | .. code-block:: c++ | |
1927 | ||
1928 | union a_union { | |
1929 | int i; | |
1930 | double d; | |
1931 | }; | |
1932 | ||
1933 | int f() { | |
1934 | union a_union t; | |
1935 | t.d = 3.0; | |
1936 | return t.i; | |
1937 | } | |
1938 | ||
1939 | The practice of reading from a different union member than the one most | |
1940 | recently written to (called 'type-punning') is common. Even with | |
1941 | :option:`-fstrict-aliasing`, type-punning is allowed, provided the memory | |
1942 | is accessed through the union type. So, the code above works as | |
1943 | expected. See :ref:`structures-unions-enumerations-and-bit-fields-implementation`. However, this code might not: | |
1944 | ||
1945 | .. code-block:: c++ | |
1946 | ||
1947 | int f() { | |
1948 | union a_union t; | |
1949 | int* ip; | |
1950 | t.d = 3.0; | |
1951 | ip = &t.i; | |
1952 | return *ip; | |
1953 | } | |
1954 | ||
1955 | Similarly, access by taking the address, casting the resulting pointer | |
1956 | and dereferencing the result has undefined behavior, even if the cast | |
1957 | uses a union type, e.g.: | |
1958 | ||
1959 | .. code-block:: c++ | |
1960 | ||
1961 | int f() { | |
1962 | double d = 3.0; | |
1963 | return ((union a_union *) &d)->i; | |
1964 | } | |
1965 | ||
1966 | The :option:`-fstrict-aliasing` option is enabled at levels | |
1967 | :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
1968 | ||
1969 | .. option:: -fipa-strict-aliasing | |
1970 | ||
1971 | Controls whether rules of :option:`-fstrict-aliasing` are applied across | |
1972 | function boundaries. Note that if multiple functions gets inlined into a | |
1973 | single function the memory accesses are no longer considered to be crossing a | |
1974 | function boundary. | |
1975 | ||
1976 | The :option:`-fipa-strict-aliasing` option is enabled by default and is | |
1977 | effective only in combination with :option:`-fstrict-aliasing`. | |
1978 | ||
1979 | .. option:: -falign-functions[={n}[:{m}[:{n2}[:{m2}]]]] | |
1980 | ||
1981 | Align the start of functions to the next power-of-two greater than or | |
1982 | equal to :samp:`{n}`, skipping up to :samp:`{m}` -1 bytes. This ensures that at | |
1983 | least the first :samp:`{m}` bytes of the function can be fetched by the CPU | |
1984 | without crossing an :samp:`{n}` -byte alignment boundary. | |
1985 | ||
1986 | If :samp:`{m}` is not specified, it defaults to :samp:`{n}`. | |
1987 | ||
1988 | Examples: :option:`-falign-functions=32` aligns functions to the next | |
1989 | 32-byte boundary, :option:`-falign-functions=24` aligns to the next | |
1990 | 32-byte boundary only if this can be done by skipping 23 bytes or less, | |
1991 | :option:`-falign-functions=32:7` aligns to the next | |
1992 | 32-byte boundary only if this can be done by skipping 6 bytes or less. | |
1993 | ||
1994 | The second pair of :samp:`{n2}:{m2}` values allows you to specify | |
1995 | a secondary alignment: :option:`-falign-functions=64:7:32:3` aligns to | |
1996 | the next 64-byte boundary if this can be done by skipping 6 bytes or less, | |
1997 | otherwise aligns to the next 32-byte boundary if this can be done | |
1998 | by skipping 2 bytes or less. | |
1999 | If :samp:`{m2}` is not specified, it defaults to :samp:`{n2}`. | |
2000 | ||
2001 | Some assemblers only support this flag when :samp:`{n}` is a power of two; | |
2002 | in that case, it is rounded up. | |
2003 | ||
2004 | :option:`-fno-align-functions` and :option:`-falign-functions=1` are | |
2005 | equivalent and mean that functions are not aligned. | |
2006 | ||
2007 | If :samp:`{n}` is not specified or is zero, use a machine-dependent default. | |
2008 | The maximum allowed :samp:`{n}` option value is 65536. | |
2009 | ||
2010 | Enabled at levels :option:`-O2`, :option:`-O3`. | |
2011 | ||
2012 | .. option:: -flimit-function-alignment | |
2013 | ||
2014 | If this option is enabled, the compiler tries to avoid unnecessarily | |
2015 | overaligning functions. It attempts to instruct the assembler to align | |
2016 | by the amount specified by :option:`-falign-functions`, but not to | |
2017 | skip more bytes than the size of the function. | |
2018 | ||
2019 | .. option:: -falign-labels[={n}[:{m}[:{n2}[:{m2}]]]] | |
2020 | ||
2021 | Align all branch targets to a power-of-two boundary. | |
2022 | ||
2023 | Parameters of this option are analogous to the :option:`-falign-functions` option. | |
2024 | :option:`-fno-align-labels` and :option:`-falign-labels=1` are | |
2025 | equivalent and mean that labels are not aligned. | |
2026 | ||
2027 | If :option:`-falign-loops` or :option:`-falign-jumps` are applicable and | |
2028 | are greater than this value, then their values are used instead. | |
2029 | ||
2030 | If :samp:`{n}` is not specified or is zero, use a machine-dependent default | |
2031 | which is very likely to be :samp:`1`, meaning no alignment. | |
2032 | The maximum allowed :samp:`{n}` option value is 65536. | |
2033 | ||
2034 | Enabled at levels :option:`-O2`, :option:`-O3`. | |
2035 | ||
2036 | .. option:: -falign-loops[={n}[:{m}[:{n2}[:{m2}]]]] | |
2037 | ||
2038 | Align loops to a power-of-two boundary. If the loops are executed | |
2039 | many times, this makes up for any execution of the dummy padding | |
2040 | instructions. | |
2041 | ||
2042 | If :option:`-falign-labels` is greater than this value, then its value | |
2043 | is used instead. | |
2044 | ||
2045 | Parameters of this option are analogous to the :option:`-falign-functions` option. | |
2046 | :option:`-fno-align-loops` and :option:`-falign-loops=1` are | |
2047 | equivalent and mean that loops are not aligned. | |
2048 | The maximum allowed :samp:`{n}` option value is 65536. | |
2049 | ||
2050 | If :samp:`{n}` is not specified or is zero, use a machine-dependent default. | |
2051 | ||
2052 | Enabled at levels :option:`-O2`, :option:`-O3`. | |
2053 | ||
2054 | .. option:: -falign-jumps[={n}[:{m}[:{n2}[:{m2}]]]] | |
2055 | ||
2056 | Align branch targets to a power-of-two boundary, for branch targets | |
2057 | where the targets can only be reached by jumping. In this case, | |
2058 | no dummy operations need be executed. | |
2059 | ||
2060 | If :option:`-falign-labels` is greater than this value, then its value | |
2061 | is used instead. | |
2062 | ||
2063 | Parameters of this option are analogous to the :option:`-falign-functions` option. | |
2064 | :option:`-fno-align-jumps` and :option:`-falign-jumps=1` are | |
2065 | equivalent and mean that loops are not aligned. | |
2066 | ||
2067 | If :samp:`{n}` is not specified or is zero, use a machine-dependent default. | |
2068 | The maximum allowed :samp:`{n}` option value is 65536. | |
2069 | ||
2070 | Enabled at levels :option:`-O2`, :option:`-O3`. | |
2071 | ||
2072 | .. option:: -fno-allocation-dce | |
2073 | ||
2074 | Do not remove unused C++ allocations in dead code elimination. | |
2075 | ||
2076 | .. option:: -fallow-store-data-races | |
2077 | ||
2078 | Allow the compiler to perform optimizations that may introduce new data races | |
2079 | on stores, without proving that the variable cannot be concurrently accessed | |
2080 | by other threads. Does not affect optimization of local data. It is safe to | |
2081 | use this option if it is known that global data will not be accessed by | |
2082 | multiple threads. | |
2083 | ||
2084 | Examples of optimizations enabled by :option:`-fallow-store-data-races` include | |
2085 | hoisting or if-conversions that may cause a value that was already in memory | |
2086 | to be re-written with that same value. Such re-writing is safe in a single | |
2087 | threaded context but may be unsafe in a multi-threaded context. Note that on | |
2088 | some processors, if-conversions may be required in order to enable | |
2089 | vectorization. | |
2090 | ||
2091 | Enabled at level :option:`-Ofast`. | |
2092 | ||
2093 | .. option:: -funit-at-a-time | |
2094 | ||
2095 | This option is left for compatibility reasons. :option:`-funit-at-a-time` | |
2096 | has no effect, while :option:`-fno-unit-at-a-time` implies | |
2097 | :option:`-fno-toplevel-reorder` and :option:`-fno-section-anchors`. | |
2098 | ||
2099 | Enabled by default. | |
2100 | ||
2101 | .. option:: -fno-toplevel-reorder | |
2102 | ||
2103 | Do not reorder top-level functions, variables, and ``asm`` | |
2104 | statements. Output them in the same order that they appear in the | |
2105 | input file. When this option is used, unreferenced static variables | |
2106 | are not removed. This option is intended to support existing code | |
2107 | that relies on a particular ordering. For new code, it is better to | |
2108 | use attributes when possible. | |
2109 | ||
2110 | :option:`-ftoplevel-reorder` is the default at :option:`-O1` and higher, and | |
2111 | also at :option:`-O0` if :option:`-fsection-anchors` is explicitly requested. | |
2112 | Additionally :option:`-fno-toplevel-reorder` implies | |
2113 | :option:`-fno-section-anchors`. | |
2114 | ||
2115 | .. option:: -ftoplevel-reorder | |
2116 | ||
2117 | Default setting; overrides :option:`-fno-toplevel-reorder`. | |
2118 | ||
2119 | .. option:: -funreachable-traps | |
2120 | ||
2121 | With this option, the compiler turns calls to | |
2122 | ``__builtin_unreachable`` into traps, instead of using them for | |
2123 | optimization. This also affects any such calls implicitly generated | |
2124 | by the compiler. | |
2125 | ||
2126 | This option has the same effect as :option:`-fsanitize=unreachable | |
2127 | -fsanitize-trap=unreachable`, but does not affect the values of those | |
2128 | options. If :option:`-fsanitize=unreachable` is enabled, that option | |
2129 | takes priority over this one. | |
2130 | ||
2131 | This option is enabled by default at :option:`-O0` and :option:`-Og`. | |
2132 | ||
2133 | .. option:: -fweb | |
2134 | ||
2135 | Constructs webs as commonly used for register allocation purposes and assign | |
2136 | each web individual pseudo register. This allows the register allocation pass | |
2137 | to operate on pseudos directly, but also strengthens several other optimization | |
2138 | passes, such as CSE, loop optimizer and trivial dead code remover. It can, | |
2139 | however, make debugging impossible, since variables no longer stay in a | |
2140 | 'home register'. | |
2141 | ||
2142 | Enabled by default with :option:`-funroll-loops`. | |
2143 | ||
2144 | .. option:: -fwhole-program | |
2145 | ||
2146 | Assume that the current compilation unit represents the whole program being | |
2147 | compiled. All public functions and variables with the exception of ``main`` | |
2148 | and those merged by attribute :fn-attr:`externally_visible` become static functions | |
2149 | and in effect are optimized more aggressively by interprocedural optimizers. | |
2150 | ||
2151 | This option should not be used in combination with :option:`-flto`. | |
2152 | Instead relying on a linker plugin should provide safer and more precise | |
2153 | information. | |
2154 | ||
2155 | .. option:: -flto[={n}] | |
2156 | ||
2157 | This option runs the standard link-time optimizer. When invoked | |
2158 | with source code, it generates GIMPLE (one of GCC's internal | |
2159 | representations) and writes it to special ELF sections in the object | |
2160 | file. When the object files are linked together, all the function | |
2161 | bodies are read from these ELF sections and instantiated as if they | |
2162 | had been part of the same translation unit. | |
2163 | ||
2164 | To use the link-time optimizer, :option:`-flto` and optimization | |
2165 | options should be specified at compile time and during the final link. | |
2166 | It is recommended that you compile all the files participating in the | |
2167 | same link with the same options and also specify those options at | |
2168 | link time. | |
2169 | For example: | |
2170 | ||
2171 | .. code-block:: shell | |
2172 | ||
2173 | gcc -c -O2 -flto foo.c | |
2174 | gcc -c -O2 -flto bar.c | |
2175 | gcc -o myprog -flto -O2 foo.o bar.o | |
2176 | ||
2177 | The first two invocations to GCC save a bytecode representation | |
2178 | of GIMPLE into special ELF sections inside :samp:`foo.o` and | |
2179 | :samp:`bar.o`. The final invocation reads the GIMPLE bytecode from | |
2180 | :samp:`foo.o` and :samp:`bar.o`, merges the two files into a single | |
2181 | internal image, and compiles the result as usual. Since both | |
2182 | :samp:`foo.o` and :samp:`bar.o` are merged into a single image, this | |
2183 | causes all the interprocedural analyses and optimizations in GCC to | |
2184 | work across the two files as if they were a single one. This means, | |
2185 | for example, that the inliner is able to inline functions in | |
2186 | :samp:`bar.o` into functions in :samp:`foo.o` and vice-versa. | |
2187 | ||
2188 | Another (simpler) way to enable link-time optimization is: | |
2189 | ||
2190 | .. code-block:: shell | |
2191 | ||
2192 | gcc -o myprog -flto -O2 foo.c bar.c | |
2193 | ||
2194 | The above generates bytecode for :samp:`foo.c` and :samp:`bar.c`, | |
2195 | merges them together into a single GIMPLE representation and optimizes | |
2196 | them as usual to produce :samp:`myprog`. | |
2197 | ||
2198 | The important thing to keep in mind is that to enable link-time | |
2199 | optimizations you need to use the GCC driver to perform the link step. | |
2200 | GCC automatically performs link-time optimization if any of the | |
2201 | objects involved were compiled with the :option:`-flto` command-line option. | |
2202 | You can always override | |
2203 | the automatic decision to do link-time optimization | |
2204 | by passing :option:`-fno-lto` to the link command. | |
2205 | ||
2206 | To make whole program optimization effective, it is necessary to make | |
2207 | certain whole program assumptions. The compiler needs to know | |
2208 | what functions and variables can be accessed by libraries and runtime | |
2209 | outside of the link-time optimized unit. When supported by the linker, | |
2210 | the linker plugin (see :option:`-fuse-linker-plugin`) passes information | |
2211 | to the compiler about used and externally visible symbols. When | |
2212 | the linker plugin is not available, :option:`-fwhole-program` should be | |
2213 | used to allow the compiler to make these assumptions, which leads | |
2214 | to more aggressive optimization decisions. | |
2215 | ||
2216 | When a file is compiled with :option:`-flto` without | |
2217 | :option:`-fuse-linker-plugin`, the generated object file is larger than | |
2218 | a regular object file because it contains GIMPLE bytecodes and the usual | |
2219 | final code (see :option:`-ffat-lto-objects`). This means that | |
2220 | object files with LTO information can be linked as normal object | |
2221 | files; if :option:`-fno-lto` is passed to the linker, no | |
2222 | interprocedural optimizations are applied. Note that when | |
2223 | :option:`-fno-fat-lto-objects` is enabled the compile stage is faster | |
2224 | but you cannot perform a regular, non-LTO link on them. | |
2225 | ||
2226 | When producing the final binary, GCC only | |
2227 | applies link-time optimizations to those files that contain bytecode. | |
2228 | Therefore, you can mix and match object files and libraries with | |
2229 | GIMPLE bytecodes and final object code. GCC automatically selects | |
2230 | which files to optimize in LTO mode and which files to link without | |
2231 | further processing. | |
2232 | ||
2233 | Generally, options specified at link time override those | |
2234 | specified at compile time, although in some cases GCC attempts to infer | |
2235 | link-time options from the settings used to compile the input files. | |
2236 | ||
2237 | If you do not specify an optimization level option :option:`-O` at | |
2238 | link time, then GCC uses the highest optimization level | |
2239 | used when compiling the object files. Note that it is generally | |
2240 | ineffective to specify an optimization level option only at link time and | |
2241 | not at compile time, for two reasons. First, compiling without | |
2242 | optimization suppresses compiler passes that gather information | |
2243 | needed for effective optimization at link time. Second, some early | |
2244 | optimization passes can be performed only at compile time and | |
2245 | not at link time. | |
2246 | ||
2247 | There are some code generation flags preserved by GCC when | |
2248 | generating bytecodes, as they need to be used during the final link. | |
2249 | Currently, the following options and their settings are taken from | |
2250 | the first object file that explicitly specifies them: | |
2251 | :option:`-fcommon`, :option:`-fexceptions`, :option:`-fnon-call-exceptions`, | |
2252 | :option:`-fgnu-tm` and all the :option:`-m` target flags. | |
2253 | ||
2254 | The following options :option:`-fPIC`, :option:`-fpic`, :option:`-fpie` and | |
2255 | :option:`-fPIE` are combined based on the following scheme: | |
2256 | ||
2257 | .. list-table:: | |
2258 | :header-rows: 1 | |
2259 | ||
2260 | * - argument 1 | |
2261 | - argument 2 | |
2262 | - output | |
2263 | ||
2264 | * - :option:`-fPIC` | |
2265 | - :option:`-fpic` | |
2266 | - :option:`-fpic` | |
2267 | * - :option:`-fPIC` | |
2268 | - :option:`-fno-pic` | |
2269 | - :option:`-fno-pic` | |
2270 | * - :option:`-fpic`/:option:`-fPIC` | |
2271 | - no option | |
2272 | - no option | |
2273 | * - :option:`-fPIC` | |
2274 | - :option:`-fPIE` | |
2275 | - :option:`-fPIE` | |
2276 | * - :option:`-fpic` | |
2277 | - :option:`-fPIE` | |
2278 | - :option:`-fpie` | |
2279 | * - :option:`-fPIC`/:option:`-fpic` | |
2280 | - :option:`-fpie` | |
2281 | - :option:`-fpie` | |
2282 | ||
2283 | Certain ABI-changing flags are required to match in all compilation units, | |
2284 | and trying to override this at link time with a conflicting value | |
2285 | is ignored. This includes options such as :option:`-freg-struct-return` | |
2286 | and :option:`-fpcc-struct-return`. | |
2287 | ||
2288 | Other options such as :option:`-ffp-contract`, :option:`-fno-strict-overflow`, | |
2289 | :option:`-fwrapv`, :option:`-fno-trapv` or :option:`-fno-strict-aliasing` | |
2290 | are passed through to the link stage and merged conservatively for | |
2291 | conflicting translation units. Specifically | |
2292 | :option:`-fno-strict-overflow`, :option:`-fwrapv` and :option:`-fno-trapv` take | |
2293 | precedence; and for example :option:`-ffp-contract=off` takes precedence | |
2294 | over :option:`-ffp-contract=fast`. You can override them at link time. | |
2295 | ||
2296 | Diagnostic options such as :option:`-Wstringop-overflow` are passed | |
2297 | through to the link stage and their setting matches that of the | |
2298 | compile-step at function granularity. Note that this matters only | |
2299 | for diagnostics emitted during optimization. Note that code | |
2300 | transforms such as inlining can lead to warnings being enabled | |
2301 | or disabled for regions if code not consistent with the setting | |
2302 | at compile time. | |
2303 | ||
2304 | When you need to pass options to the assembler via :option:`-Wa` or | |
2305 | :option:`-Xassembler` make sure to either compile such translation | |
2306 | units with :option:`-fno-lto` or consistently use the same assembler | |
2307 | options on all translation units. You can alternatively also | |
2308 | specify assembler options at LTO link time. | |
2309 | ||
2310 | To enable debug info generation you need to supply :option:`-g` at | |
2311 | compile time. If any of the input files at link time were built | |
2312 | with debug info generation enabled the link will enable debug info | |
2313 | generation as well. Any elaborate debug info settings | |
2314 | like the dwarf level :option:`-gdwarf-5` need to be explicitly repeated | |
2315 | at the linker command line and mixing different settings in different | |
2316 | translation units is discouraged. | |
2317 | ||
2318 | If LTO encounters objects with C linkage declared with incompatible | |
2319 | types in separate translation units to be linked together (undefined | |
2320 | behavior according to ISO C99 6.2.7), a non-fatal diagnostic may be | |
2321 | issued. The behavior is still undefined at run time. Similar | |
2322 | diagnostics may be raised for other languages. | |
2323 | ||
2324 | Another feature of LTO is that it is possible to apply interprocedural | |
2325 | optimizations on files written in different languages: | |
2326 | ||
2327 | .. code-block:: shell | |
2328 | ||
2329 | gcc -c -flto foo.c | |
2330 | g++ -c -flto bar.cc | |
2331 | gfortran -c -flto baz.f90 | |
2332 | g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran | |
2333 | ||
2334 | Notice that the final link is done with :command:`g++` to get the C++ | |
2335 | runtime libraries and :option:`-lgfortran` is added to get the Fortran | |
2336 | runtime libraries. In general, when mixing languages in LTO mode, you | |
2337 | should use the same link command options as when mixing languages in a | |
2338 | regular (non-LTO) compilation. | |
2339 | ||
2340 | If object files containing GIMPLE bytecode are stored in a library archive, say | |
2341 | :samp:`libfoo.a`, it is possible to extract and use them in an LTO link if you | |
2342 | are using a linker with plugin support. To create static libraries suitable | |
2343 | for LTO, use :command:`gcc-ar` and :command:`gcc-ranlib` instead of :command:`ar` | |
2344 | and :command:`ranlib`; | |
2345 | to show the symbols of object files with GIMPLE bytecode, use | |
2346 | :command:`gcc-nm`. Those commands require that :command:`ar`, :command:`ranlib` | |
2347 | and :command:`nm` have been compiled with plugin support. At link time, use the | |
2348 | flag :option:`-fuse-linker-plugin` to ensure that the library participates in | |
2349 | the LTO optimization process: | |
2350 | ||
2351 | .. code-block:: shell | |
2352 | ||
2353 | gcc -o myprog -O2 -flto -fuse-linker-plugin a.o b.o -lfoo | |
2354 | ||
2355 | With the linker plugin enabled, the linker extracts the needed | |
2356 | GIMPLE files from :samp:`libfoo.a` and passes them on to the running GCC | |
2357 | to make them part of the aggregated GIMPLE image to be optimized. | |
2358 | ||
2359 | If you are not using a linker with plugin support and/or do not | |
2360 | enable the linker plugin, then the objects inside :samp:`libfoo.a` | |
2361 | are extracted and linked as usual, but they do not participate | |
2362 | in the LTO optimization process. In order to make a static library suitable | |
2363 | for both LTO optimization and usual linkage, compile its object files with | |
2364 | :option:`-flto` :option:`-ffat-lto-objects`. | |
2365 | ||
2366 | Link-time optimizations do not require the presence of the whole program to | |
2367 | operate. If the program does not require any symbols to be exported, it is | |
2368 | possible to combine :option:`-flto` and :option:`-fwhole-program` to allow | |
2369 | the interprocedural optimizers to use more aggressive assumptions which may | |
2370 | lead to improved optimization opportunities. | |
2371 | Use of :option:`-fwhole-program` is not needed when linker plugin is | |
2372 | active (see :option:`-fuse-linker-plugin`). | |
2373 | ||
2374 | The current implementation of LTO makes no | |
2375 | attempt to generate bytecode that is portable between different | |
2376 | types of hosts. The bytecode files are versioned and there is a | |
2377 | strict version check, so bytecode files generated in one version of | |
2378 | GCC do not work with an older or newer version of GCC. | |
2379 | ||
2380 | Link-time optimization does not work well with generation of debugging | |
2381 | information on systems other than those using a combination of ELF and | |
2382 | DWARF. | |
2383 | ||
2384 | If you specify the optional :samp:`{n}`, the optimization and code | |
2385 | generation done at link time is executed in parallel using :samp:`{n}` | |
2386 | parallel jobs by utilizing an installed :command:`make` program. The | |
2387 | environment variable :envvar:`MAKE` may be used to override the program | |
2388 | used. | |
2389 | ||
2390 | You can also specify :option:`-flto=jobserver` to use GNU make's | |
2391 | job server mode to determine the number of parallel jobs. This | |
2392 | is useful when the Makefile calling GCC is already executing in parallel. | |
2393 | You must prepend a :samp:`+` to the command recipe in the parent Makefile | |
2394 | for this to work. This option likely only works if :envvar:`MAKE` is | |
2395 | GNU make. Even without the option value, GCC tries to automatically | |
2396 | detect a running GNU make's job server. | |
2397 | ||
2398 | Use :option:`-flto=auto` to use GNU make's job server, if available, | |
2399 | or otherwise fall back to autodetection of the number of CPU threads | |
2400 | present in your system. | |
2401 | ||
2402 | .. option:: -flto-partition={alg} | |
2403 | ||
2404 | Specify the partitioning algorithm used by the link-time optimizer. | |
2405 | The value is either :samp:`1to1` to specify a partitioning mirroring | |
2406 | the original source files or :samp:`balanced` to specify partitioning | |
2407 | into equally sized chunks (whenever possible) or :samp:`max` to create | |
2408 | new partition for every symbol where possible. Specifying :samp:`none` | |
2409 | as an algorithm disables partitioning and streaming completely. | |
2410 | The default value is :samp:`balanced`. While :samp:`1to1` can be used | |
2411 | as an workaround for various code ordering issues, the :samp:`max` | |
2412 | partitioning is intended for internal testing only. | |
2413 | The value :samp:`one` specifies that exactly one partition should be | |
2414 | used while the value :samp:`none` bypasses partitioning and executes | |
2415 | the link-time optimization step directly from the WPA phase. | |
2416 | ||
2417 | .. option:: -flto-compression-level={n} | |
2418 | ||
2419 | This option specifies the level of compression used for intermediate | |
2420 | language written to LTO object files, and is only meaningful in | |
2421 | conjunction with LTO mode (:option:`-flto`). GCC currently supports two | |
2422 | LTO compression algorithms. For zstd, valid values are 0 (no compression) | |
2423 | to 19 (maximum compression), while zlib supports values from 0 to 9. | |
2424 | Values outside this range are clamped to either minimum or maximum | |
2425 | of the supported values. If the option is not given, | |
2426 | a default balanced compression setting is used. | |
2427 | ||
2428 | .. option:: -fuse-linker-plugin | |
2429 | ||
2430 | Enables the use of a linker plugin during link-time optimization. This | |
2431 | option relies on plugin support in the linker, which is available in gold | |
2432 | or in GNU ld 2.21 or newer. | |
2433 | ||
2434 | This option enables the extraction of object files with GIMPLE bytecode out | |
2435 | of library archives. This improves the quality of optimization by exposing | |
2436 | more code to the link-time optimizer. This information specifies what | |
2437 | symbols can be accessed externally (by non-LTO object or during dynamic | |
2438 | linking). Resulting code quality improvements on binaries (and shared | |
2439 | libraries that use hidden visibility) are similar to :option:`-fwhole-program`. | |
2440 | See :option:`-flto` for a description of the effect of this flag and how to | |
2441 | use it. | |
2442 | ||
2443 | This option is enabled by default when LTO support in GCC is enabled | |
2444 | and GCC was configured for use with | |
2445 | a linker supporting plugins (GNU ld 2.21 or newer or gold). | |
2446 | ||
2447 | .. option:: -ffat-lto-objects | |
2448 | ||
2449 | Fat LTO objects are object files that contain both the intermediate language | |
2450 | and the object code. This makes them usable for both LTO linking and normal | |
2451 | linking. This option is effective only when compiling with :option:`-flto` | |
2452 | and is ignored at link time. | |
2453 | ||
2454 | :option:`-fno-fat-lto-objects` improves compilation time over plain LTO, but | |
2455 | requires the complete toolchain to be aware of LTO. It requires a linker with | |
2456 | linker plugin support for basic functionality. Additionally, | |
2457 | :command:`nm`, :command:`ar` and :command:`ranlib` | |
2458 | need to support linker plugins to allow a full-featured build environment | |
2459 | (capable of building static libraries etc). GCC provides the :command:`gcc-ar`, | |
2460 | :command:`gcc-nm`, :command:`gcc-ranlib` wrappers to pass the right options | |
2461 | to these tools. With non fat LTO makefiles need to be modified to use them. | |
2462 | ||
2463 | Note that modern binutils provide plugin auto-load mechanism. | |
2464 | Installing the linker plugin into :samp:`$libdir/bfd-plugins` has the same | |
2465 | effect as usage of the command wrappers (:command:`gcc-ar`, :command:`gcc-nm` and | |
2466 | :command:`gcc-ranlib`). | |
2467 | ||
2468 | The default is :option:`-fno-fat-lto-objects` on targets with linker plugin | |
2469 | support. | |
2470 | ||
2471 | .. option:: -fcompare-elim | |
2472 | ||
2473 | After register allocation and post-register allocation instruction splitting, | |
2474 | identify arithmetic instructions that compute processor flags similar to a | |
2475 | comparison operation based on that arithmetic. If possible, eliminate the | |
2476 | explicit comparison operation. | |
2477 | ||
2478 | This pass only applies to certain targets that cannot explicitly represent | |
2479 | the comparison operation before register allocation is complete. | |
2480 | ||
2481 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
2482 | ||
2483 | .. option:: -fcprop-registers | |
2484 | ||
2485 | After register allocation and post-register allocation instruction splitting, | |
2486 | perform a copy-propagation pass to try to reduce scheduling dependencies | |
2487 | and occasionally eliminate the copy. | |
2488 | ||
2489 | Enabled at levels :option:`-O1`, :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
2490 | ||
2491 | .. option:: -fprofile-correction | |
2492 | ||
2493 | Profiles collected using an instrumented binary for multi-threaded programs may | |
2494 | be inconsistent due to missed counter updates. When this option is specified, | |
2495 | GCC uses heuristics to correct or smooth out such inconsistencies. By | |
2496 | default, GCC emits an error message when an inconsistent profile is detected. | |
2497 | ||
2498 | This option is enabled by :option:`-fauto-profile`. | |
2499 | ||
2500 | .. option:: -fprofile-partial-training | |
2501 | ||
2502 | With ``-fprofile-use`` all portions of programs not executed during train | |
2503 | run are optimized agressively for size rather than speed. In some cases it is | |
2504 | not practical to train all possible hot paths in the program. (For | |
2505 | example, program may contain functions specific for a given hardware and | |
2506 | trianing may not cover all hardware configurations program is run on.) With | |
2507 | ``-fprofile-partial-training`` profile feedback will be ignored for all | |
2508 | functions not executed during the train run leading them to be optimized as if | |
2509 | they were compiled without profile feedback. This leads to better performance | |
2510 | when train run is not representative but also leads to significantly bigger | |
2511 | code. | |
2512 | ||
2513 | .. option:: -fprofile-use, -fprofile-use={path} | |
2514 | ||
2515 | Enable profile feedback-directed optimizations, | |
2516 | and the following optimizations, many of which | |
2517 | are generally profitable only with profile feedback available: | |
2518 | ||
2519 | :option:`-fbranch-probabilities` :option:`-fprofile-values` |gol| | |
2520 | :option:`-funroll-loops` :option:`-fpeel-loops` :option:`-ftracer` :option:`-fvpt` |gol| | |
2521 | :option:`-finline-functions` :option:`-fipa-cp` :option:`-fipa-cp-clone` :option:`-fipa-bit-cp` |gol| | |
2522 | :option:`-fpredictive-commoning` :option:`-fsplit-loops` :option:`-funswitch-loops` |gol| | |
2523 | :option:`-fgcse-after-reload` :option:`-ftree-loop-vectorize` :option:`-ftree-slp-vectorize` |gol| | |
2524 | :option:`-fvect-cost-model=dynamic` :option:`-ftree-loop-distribute-patterns` |gol| | |
2525 | :option:`-fprofile-reorder-functions` | |
2526 | ||
2527 | Before you can use this option, you must first generate profiling information. | |
2528 | See :ref:`instrumentation-options`, for information about the | |
2529 | :option:`-fprofile-generate` option. | |
2530 | ||
2531 | By default, GCC emits an error message if the feedback profiles do not | |
2532 | match the source code. This error can be turned into a warning by using | |
2533 | :option:`-Wno-error=coverage-mismatch`. Note this may result in poorly | |
2534 | optimized code. Additionally, by default, GCC also emits a warning message if | |
2535 | the feedback profiles do not exist (see :option:`-Wmissing-profile`). | |
2536 | ||
2537 | If :samp:`{path}` is specified, GCC looks at the :samp:`{path}` to find | |
2538 | the profile feedback data files. See :option:`-fprofile-dir`. | |
2539 | ||
2540 | .. option:: -fauto-profile, -fauto-profile={path} | |
2541 | ||
2542 | Enable sampling-based feedback-directed optimizations, | |
2543 | and the following optimizations, | |
2544 | many of which are generally profitable only with profile feedback available: | |
2545 | ||
2546 | :option:`-fbranch-probabilities` :option:`-fprofile-values` |gol| | |
2547 | :option:`-funroll-loops` :option:`-fpeel-loops` :option:`-ftracer` :option:`-fvpt` |gol| | |
2548 | :option:`-finline-functions` :option:`-fipa-cp` :option:`-fipa-cp-clone` :option:`-fipa-bit-cp` |gol| | |
2549 | :option:`-fpredictive-commoning` :option:`-fsplit-loops` :option:`-funswitch-loops` |gol| | |
2550 | :option:`-fgcse-after-reload` :option:`-ftree-loop-vectorize` :option:`-ftree-slp-vectorize` |gol| | |
2551 | :option:`-fvect-cost-model=dynamic` :option:`-ftree-loop-distribute-patterns` |gol| | |
2552 | :option:`-fprofile-correction` | |
2553 | ||
2554 | :samp:`{path}` is the name of a file containing AutoFDO profile information. | |
2555 | If omitted, it defaults to :samp:`fbdata.afdo` in the current directory. | |
2556 | ||
2557 | Producing an AutoFDO profile data file requires running your program | |
2558 | with the :command:`perf` utility on a supported GNU/Linux target system. | |
2559 | For more information, see https://perf.wiki.kernel.org/. | |
2560 | ||
2561 | E.g. | |
2562 | ||
2563 | .. code-block:: c++ | |
2564 | ||
2565 | perf record -e br_inst_retired:near_taken -b -o perf.data \ | |
2566 | -- your_program | |
2567 | ||
2568 | Then use the :command:`create_gcov` tool to convert the raw profile data | |
2569 | to a format that can be used by GCC. You must also supply the | |
2570 | unstripped binary for your program to this tool. | |
2571 | See https://github.com/google/autofdo. | |
2572 | ||
2573 | E.g. | |
2574 | ||
2575 | .. code-block:: c++ | |
2576 | ||
2577 | create_gcov --binary=your_program.unstripped --profile=perf.data \ | |
2578 | --gcov=profile.afdo | |
2579 | ||
2580 | The following options control compiler behavior regarding floating-point | |
2581 | arithmetic. These options trade off between speed and | |
2582 | correctness. All must be specifically enabled. | |
2583 | ||
2584 | .. option:: -ffloat-store | |
2585 | ||
2586 | Do not store floating-point variables in registers, and inhibit other | |
2587 | options that might change whether a floating-point value is taken from a | |
2588 | register or memory. | |
2589 | ||
2590 | .. index:: floating-point precision | |
2591 | ||
2592 | This option prevents undesirable excess precision on machines such as | |
2593 | the 68000 where the floating registers (of the 68881) keep more | |
2594 | precision than a ``double`` is supposed to have. Similarly for the | |
2595 | x86 architecture. For most programs, the excess precision does only | |
2596 | good, but a few programs rely on the precise definition of IEEE floating | |
2597 | point. Use :option:`-ffloat-store` for such programs, after modifying | |
2598 | them to store all pertinent intermediate computations into variables. | |
2599 | ||
2600 | .. option:: -fexcess-precision={style} | |
2601 | ||
2602 | This option allows further control over excess precision on machines | |
2603 | where floating-point operations occur in a format with more precision or | |
2604 | range than the IEEE standard and interchange floating-point types. By | |
2605 | default, :option:`-fexcess-precision=fast` is in effect; this means that | |
2606 | operations may be carried out in a wider precision than the types specified | |
2607 | in the source if that would result in faster code, and it is unpredictable | |
2608 | when rounding to the types specified in the source code takes place. | |
2609 | When compiling C or C++, if :option:`-fexcess-precision=standard` is specified | |
2610 | then excess precision follows the rules specified in ISO C99 or C++; in particular, | |
2611 | both casts and assignments cause values to be rounded to their | |
2612 | semantic types (whereas :option:`-ffloat-store` only affects | |
2613 | assignments). This option is enabled by default for C or C++ if a strict | |
2614 | conformance option such as :option:`-std=c99` or :option:`-std=c++17` is used. | |
2615 | :option:`-ffast-math` enables :option:`-fexcess-precision=fast` by default | |
2616 | regardless of whether a strict conformance option is used. | |
2617 | ||
2618 | :option:`-fexcess-precision=standard` is not implemented for languages | |
2619 | other than C or C++. On the x86, it has no effect if :option:`-mfpmath=sse` | |
2620 | or :option:`-mfpmath=sse+387` is specified; in the former case, IEEE | |
2621 | semantics apply without excess precision, and in the latter, rounding | |
2622 | is unpredictable. | |
2623 | ||
2624 | .. option:: -ffast-math | |
2625 | ||
2626 | Sets the options :option:`-fno-math-errno`, :option:`-funsafe-math-optimizations`, | |
2627 | :option:`-ffinite-math-only`, :option:`-fno-rounding-math`, | |
2628 | :option:`-fno-signaling-nans`, :option:`-fcx-limited-range` and | |
2629 | :option:`-fexcess-precision=fast`. | |
2630 | ||
2631 | This option causes the preprocessor macro ``__FAST_MATH__`` to be defined. | |
2632 | ||
2633 | This option is not turned on by any :option:`-O` option besides | |
2634 | :option:`-Ofast` since it can result in incorrect output for programs | |
2635 | that depend on an exact implementation of IEEE or ISO rules/specifications | |
2636 | for math functions. It may, however, yield faster code for programs | |
2637 | that do not require the guarantees of these specifications. | |
2638 | ||
2639 | .. option:: -fno-math-errno | |
2640 | ||
2641 | Do not set ``errno`` after calling math functions that are executed | |
2642 | with a single instruction, e.g., ``sqrt``. A program that relies on | |
2643 | IEEE exceptions for math error handling may want to use this flag | |
2644 | for speed while maintaining IEEE arithmetic compatibility. | |
2645 | ||
2646 | This option is not turned on by any :option:`-O` option since | |
2647 | it can result in incorrect output for programs that depend on | |
2648 | an exact implementation of IEEE or ISO rules/specifications for | |
2649 | math functions. It may, however, yield faster code for programs | |
2650 | that do not require the guarantees of these specifications. | |
2651 | ||
2652 | The default is :option:`-fmath-errno`. | |
2653 | ||
2654 | On Darwin systems, the math library never sets ``errno``. There is | |
2655 | therefore no reason for the compiler to consider the possibility that | |
2656 | it might, and :option:`-fno-math-errno` is the default. | |
2657 | ||
2658 | .. option:: -fmath-errno | |
2659 | ||
2660 | Default setting; overrides :option:`-fno-math-errno`. | |
2661 | ||
2662 | .. option:: -funsafe-math-optimizations | |
2663 | ||
2664 | Allow optimizations for floating-point arithmetic that (a) assume | |
2665 | that arguments and results are valid and (b) may violate IEEE or | |
2666 | ANSI standards. When used at link time, it may include libraries | |
2667 | or startup files that change the default FPU control word or other | |
2668 | similar optimizations. | |
2669 | ||
2670 | This option is not turned on by any :option:`-O` option since | |
2671 | it can result in incorrect output for programs that depend on | |
2672 | an exact implementation of IEEE or ISO rules/specifications for | |
2673 | math functions. It may, however, yield faster code for programs | |
2674 | that do not require the guarantees of these specifications. | |
2675 | Enables :option:`-fno-signed-zeros`, :option:`-fno-trapping-math`, | |
2676 | :option:`-fassociative-math` and :option:`-freciprocal-math`. | |
2677 | ||
2678 | The default is :option:`-fno-unsafe-math-optimizations`. | |
2679 | ||
2680 | .. option:: -fassociative-math | |
2681 | ||
2682 | Allow re-association of operands in series of floating-point operations. | |
2683 | This violates the ISO C and C++ language standard by possibly changing | |
2684 | computation result. NOTE: re-ordering may change the sign of zero as | |
2685 | well as ignore NaNs and inhibit or create underflow or overflow (and | |
2686 | thus cannot be used on code that relies on rounding behavior like | |
2687 | ``(x + 2**52) - 2**52``. May also reorder floating-point comparisons | |
2688 | and thus may not be used when ordered comparisons are required. | |
2689 | This option requires that both :option:`-fno-signed-zeros` and | |
2690 | :option:`-fno-trapping-math` be in effect. Moreover, it doesn't make | |
2691 | much sense with :option:`-frounding-math`. For Fortran the option | |
2692 | is automatically enabled when both :option:`-fno-signed-zeros` and | |
2693 | :option:`-fno-trapping-math` are in effect. | |
2694 | ||
2695 | The default is :option:`-fno-associative-math`. | |
2696 | ||
2697 | .. option:: -freciprocal-math | |
2698 | ||
2699 | Allow the reciprocal of a value to be used instead of dividing by | |
2700 | the value if this enables optimizations. For example ``x / y`` | |
2701 | can be replaced with ``x * (1/y)``, which is useful if ``(1/y)`` | |
2702 | is subject to common subexpression elimination. Note that this loses | |
2703 | precision and increases the number of flops operating on the value. | |
2704 | ||
2705 | The default is :option:`-fno-reciprocal-math`. | |
2706 | ||
2707 | .. option:: -ffinite-math-only | |
2708 | ||
2709 | Allow optimizations for floating-point arithmetic that assume | |
2710 | that arguments and results are not NaNs or +-Infs. | |
2711 | ||
2712 | This option is not turned on by any :option:`-O` option since | |
2713 | it can result in incorrect output for programs that depend on | |
2714 | an exact implementation of IEEE or ISO rules/specifications for | |
2715 | math functions. It may, however, yield faster code for programs | |
2716 | that do not require the guarantees of these specifications. | |
2717 | ||
2718 | The default is :option:`-fno-finite-math-only`. | |
2719 | ||
2720 | .. option:: -fno-signed-zeros | |
2721 | ||
2722 | Allow optimizations for floating-point arithmetic that ignore the | |
2723 | signedness of zero. IEEE arithmetic specifies the behavior of | |
2724 | distinct +0.0 and -0.0 values, which then prohibits simplification | |
2725 | of expressions such as x+0.0 or 0.0\*x (even with :option:`-ffinite-math-only`). | |
2726 | This option implies that the sign of a zero result isn't significant. | |
2727 | ||
2728 | The default is :option:`-fsigned-zeros`. | |
2729 | ||
2730 | .. option:: -fsigned-zeros | |
2731 | ||
2732 | Default setting; overrides :option:`-fno-signed-zeros`. | |
2733 | ||
2734 | .. option:: -fno-trapping-math | |
2735 | ||
2736 | Compile code assuming that floating-point operations cannot generate | |
2737 | user-visible traps. These traps include division by zero, overflow, | |
2738 | underflow, inexact result and invalid operation. This option requires | |
2739 | that :option:`-fno-signaling-nans` be in effect. Setting this option may | |
2740 | allow faster code if one relies on 'non-stop' IEEE arithmetic, for example. | |
2741 | ||
2742 | This option should never be turned on by any :option:`-O` option since | |
2743 | it can result in incorrect output for programs that depend on | |
2744 | an exact implementation of IEEE or ISO rules/specifications for | |
2745 | math functions. | |
2746 | ||
2747 | The default is :option:`-ftrapping-math`. | |
2748 | ||
2749 | Future versions of GCC may provide finer control of this setting | |
2750 | using C99's ``FENV_ACCESS`` pragma. This command-line option | |
2751 | will be used along with :option:`-frounding-math` to specify the | |
2752 | default state for ``FENV_ACCESS``. | |
2753 | ||
2754 | .. option:: -ftrapping-math | |
2755 | ||
2756 | Default setting; overrides :option:`-fno-trapping-math`. | |
2757 | ||
2758 | .. option:: -frounding-math | |
2759 | ||
2760 | Disable transformations and optimizations that assume default floating-point | |
2761 | rounding behavior. This is round-to-zero for all floating point | |
2762 | to integer conversions, and round-to-nearest for all other arithmetic | |
2763 | truncations. This option should be specified for programs that change | |
2764 | the FP rounding mode dynamically, or that may be executed with a | |
2765 | non-default rounding mode. This option disables constant folding of | |
2766 | floating-point expressions at compile time (which may be affected by | |
2767 | rounding mode) and arithmetic transformations that are unsafe in the | |
2768 | presence of sign-dependent rounding modes. | |
2769 | ||
2770 | The default is :option:`-fno-rounding-math`. | |
2771 | ||
2772 | This option is experimental and does not currently guarantee to | |
2773 | disable all GCC optimizations that are affected by rounding mode. | |
2774 | Future versions of GCC may provide finer control of this setting | |
2775 | using C99's ``FENV_ACCESS`` pragma. This command-line option | |
2776 | will be used along with :option:`-ftrapping-math` to specify the | |
2777 | default state for ``FENV_ACCESS``. | |
2778 | ||
2779 | .. option:: -fsignaling-nans | |
2780 | ||
2781 | Compile code assuming that IEEE signaling NaNs may generate user-visible | |
2782 | traps during floating-point operations. Setting this option disables | |
2783 | optimizations that may change the number of exceptions visible with | |
2784 | signaling NaNs. This option implies :option:`-ftrapping-math`. | |
2785 | ||
2786 | This option causes the preprocessor macro ``__SUPPORT_SNAN__`` to | |
2787 | be defined. | |
2788 | ||
2789 | The default is :option:`-fno-signaling-nans`. | |
2790 | ||
2791 | This option is experimental and does not currently guarantee to | |
2792 | disable all GCC optimizations that affect signaling NaN behavior. | |
2793 | ||
2794 | .. option:: -fno-fp-int-builtin-inexact | |
2795 | ||
2796 | Do not allow the built-in functions ``ceil``, ``floor``, | |
2797 | ``round`` and ``trunc``, and their ``float`` and ``long | |
2798 | double`` variants, to generate code that raises the 'inexact' | |
2799 | floating-point exception for noninteger arguments. ISO C99 and C11 | |
2800 | allow these functions to raise the 'inexact' exception, but ISO/IEC | |
2801 | TS 18661-1:2014, the C bindings to IEEE 754-2008, as integrated into | |
2802 | ISO C2X, does not allow these functions to do so. | |
2803 | ||
2804 | The default is :option:`-ffp-int-builtin-inexact`, allowing the | |
2805 | exception to be raised, unless C2X or a later C standard is selected. | |
2806 | This option does nothing unless :option:`-ftrapping-math` is in effect. | |
2807 | ||
2808 | Even if :option:`-fno-fp-int-builtin-inexact` is used, if the functions | |
2809 | generate a call to a library function then the 'inexact' exception | |
2810 | may be raised if the library implementation does not follow TS 18661. | |
2811 | ||
2812 | .. option:: -ffp-int-builtin-inexact | |
2813 | ||
2814 | Default setting; overrides :option:`-fno-fp-int-builtin-inexact`. | |
2815 | ||
2816 | .. option:: -fsingle-precision-constant | |
2817 | ||
2818 | Treat floating-point constants as single precision instead of | |
2819 | implicitly converting them to double-precision constants. | |
2820 | ||
2821 | .. option:: -fcx-limited-range | |
2822 | ||
2823 | When enabled, this option states that a range reduction step is not | |
2824 | needed when performing complex division. Also, there is no checking | |
2825 | whether the result of a complex multiplication or division is | |
2826 | ``NaN I*NaN``, with an attempt to rescue the situation in that case. The | |
2827 | default is :option:`-fno-cx-limited-range`, but is enabled by | |
2828 | :option:`-ffast-math`. | |
2829 | ||
2830 | This option controls the default setting of the ISO C99 | |
2831 | ``CX_LIMITED_RANGE`` pragma. Nevertheless, the option applies to | |
2832 | all languages. | |
2833 | ||
2834 | .. option:: -fcx-fortran-rules | |
2835 | ||
2836 | Complex multiplication and division follow Fortran rules. Range | |
2837 | reduction is done as part of complex division, but there is no checking | |
2838 | whether the result of a complex multiplication or division is ``NaN | |
2839 | + I*NaN``, with an attempt to rescue the situation in that case. | |
2840 | ||
2841 | The default is :option:`-fno-cx-fortran-rules`. | |
2842 | ||
2843 | The following options control optimizations that may improve | |
2844 | performance, but are not enabled by any :option:`-O` options. This | |
2845 | section includes experimental options that may produce broken code. | |
2846 | ||
2847 | .. option:: -fbranch-probabilities | |
2848 | ||
2849 | After running a program compiled with :option:`-fprofile-arcs` | |
2850 | (see :ref:`instrumentation-options`), | |
2851 | you can compile it a second time using | |
2852 | :option:`-fbranch-probabilities`, to improve optimizations based on | |
2853 | the number of times each branch was taken. When a program | |
2854 | compiled with :option:`-fprofile-arcs` exits, it saves arc execution | |
2855 | counts to a file called :samp:`{sourcename}.gcda` for each source | |
2856 | file. The information in this data file is very dependent on the | |
2857 | structure of the generated code, so you must use the same source code | |
2858 | and the same optimization options for both compilations. | |
2859 | See details about the file naming in :option:`-fprofile-arcs`. | |
2860 | ||
2861 | With :option:`-fbranch-probabilities`, GCC puts a | |
2862 | :samp:`REG_BR_PROB` note on each :samp:`JUMP_INSN` and :samp:`CALL_INSN`. | |
2863 | These can be used to improve optimization. Currently, they are only | |
2864 | used in one place: in :samp:`reorg.cc`, instead of guessing which path a | |
2865 | branch is most likely to take, the :samp:`REG_BR_PROB` values are used to | |
2866 | exactly determine which path is taken more often. | |
2867 | ||
2868 | Enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
2869 | ||
2870 | .. option:: -fprofile-values | |
2871 | ||
2872 | If combined with :option:`-fprofile-arcs`, it adds code so that some | |
2873 | data about values of expressions in the program is gathered. | |
2874 | ||
2875 | With :option:`-fbranch-probabilities`, it reads back the data gathered | |
2876 | from profiling values of expressions for usage in optimizations. | |
2877 | ||
2878 | Enabled by :option:`-fprofile-generate`, :option:`-fprofile-use`, and | |
2879 | :option:`-fauto-profile`. | |
2880 | ||
2881 | .. option:: -fprofile-reorder-functions | |
2882 | ||
2883 | Function reordering based on profile instrumentation collects | |
2884 | first time of execution of a function and orders these functions | |
2885 | in ascending order. | |
2886 | ||
2887 | Enabled with :option:`-fprofile-use`. | |
2888 | ||
2889 | .. option:: -fvpt | |
2890 | ||
2891 | If combined with :option:`-fprofile-arcs`, this option instructs the compiler | |
2892 | to add code to gather information about values of expressions. | |
2893 | ||
2894 | With :option:`-fbranch-probabilities`, it reads back the data gathered | |
2895 | and actually performs the optimizations based on them. | |
2896 | Currently the optimizations include specialization of division operations | |
2897 | using the knowledge about the value of the denominator. | |
2898 | ||
2899 | Enabled with :option:`-fprofile-use` and :option:`-fauto-profile`. | |
2900 | ||
2901 | .. option:: -frename-registers | |
2902 | ||
2903 | Attempt to avoid false dependencies in scheduled code by making use | |
2904 | of registers left over after register allocation. This optimization | |
2905 | most benefits processors with lots of registers. Depending on the | |
2906 | debug information format adopted by the target, however, it can | |
2907 | make debugging impossible, since variables no longer stay in | |
2908 | a 'home register'. | |
2909 | ||
2910 | Enabled by default with :option:`-funroll-loops`. | |
2911 | ||
2912 | .. option:: -fschedule-fusion | |
2913 | ||
2914 | Performs a target dependent pass over the instruction stream to schedule | |
2915 | instructions of same type together because target machine can execute them | |
2916 | more efficiently if they are adjacent to each other in the instruction flow. | |
2917 | ||
2918 | Enabled at levels :option:`-O2`, :option:`-O3`, :option:`-Os`. | |
2919 | ||
2920 | .. option:: -ftracer | |
2921 | ||
2922 | Perform tail duplication to enlarge superblock size. This transformation | |
2923 | simplifies the control flow of the function allowing other optimizations to do | |
2924 | a better job. | |
2925 | ||
2926 | Enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
2927 | ||
2928 | .. option:: -funroll-loops | |
2929 | ||
2930 | Unroll loops whose number of iterations can be determined at compile time or | |
2931 | upon entry to the loop. :option:`-funroll-loops` implies | |
2932 | :option:`-frerun-cse-after-loop`, :option:`-fweb` and :option:`-frename-registers`. | |
2933 | It also turns on complete loop peeling (i.e. complete removal of loops with | |
2934 | a small constant number of iterations). This option makes code larger, and may | |
2935 | or may not make it run faster. | |
2936 | ||
2937 | Enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
2938 | ||
2939 | .. option:: -funroll-all-loops | |
2940 | ||
2941 | Unroll all loops, even if their number of iterations is uncertain when | |
2942 | the loop is entered. This usually makes programs run more slowly. | |
2943 | :option:`-funroll-all-loops` implies the same options as | |
2944 | :option:`-funroll-loops`. | |
2945 | ||
2946 | .. option:: -fpeel-loops | |
2947 | ||
2948 | Peels loops for which there is enough information that they do not | |
2949 | roll much (from profile feedback or static analysis). It also turns on | |
2950 | complete loop peeling (i.e. complete removal of loops with small constant | |
2951 | number of iterations). | |
2952 | ||
2953 | Enabled by :option:`-O3`, :option:`-fprofile-use`, and :option:`-fauto-profile`. | |
2954 | ||
2955 | .. option:: -fmove-loop-invariants | |
2956 | ||
2957 | Enables the loop invariant motion pass in the RTL loop optimizer. Enabled | |
2958 | at level :option:`-O1` and higher, except for :option:`-Og`. | |
2959 | ||
2960 | .. option:: -fmove-loop-stores | |
2961 | ||
2962 | Enables the loop store motion pass in the GIMPLE loop optimizer. This | |
2963 | moves invariant stores to after the end of the loop in exchange for | |
2964 | carrying the stored value in a register across the iteration. | |
2965 | Note for this option to have an effect :option:`-ftree-loop-im` has to | |
2966 | be enabled as well. Enabled at level :option:`-O1` and higher, except | |
2967 | for :option:`-Og`. | |
2968 | ||
2969 | .. option:: -fsplit-loops | |
2970 | ||
2971 | Split a loop into two if it contains a condition that's always true | |
2972 | for one side of the iteration space and false for the other. | |
2973 | ||
2974 | Enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
2975 | ||
2976 | .. option:: -funswitch-loops | |
2977 | ||
2978 | Move branches with loop invariant conditions out of the loop, with duplicates | |
2979 | of the loop on both branches (modified according to result of the condition). | |
2980 | ||
2981 | Enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
2982 | ||
2983 | .. option:: -fversion-loops-for-strides | |
2984 | ||
2985 | If a loop iterates over an array with a variable stride, create another | |
2986 | version of the loop that assumes the stride is always one. For example: | |
2987 | ||
2988 | .. code-block:: c++ | |
2989 | ||
2990 | for (int i = 0; i < n; ++i) | |
2991 | x[i * stride] = ...; | |
2992 | ||
2993 | becomes: | |
2994 | ||
2995 | .. code-block:: c++ | |
2996 | ||
2997 | if (stride == 1) | |
2998 | for (int i = 0; i < n; ++i) | |
2999 | x[i] = ...; | |
3000 | else | |
3001 | for (int i = 0; i < n; ++i) | |
3002 | x[i * stride] = ...; | |
3003 | ||
3004 | This is particularly useful for assumed-shape arrays in Fortran where | |
3005 | (for example) it allows better vectorization assuming contiguous accesses. | |
3006 | This flag is enabled by default at :option:`-O3`. | |
3007 | It is also enabled by :option:`-fprofile-use` and :option:`-fauto-profile`. | |
3008 | ||
3009 | .. option:: -ffunction-sections, -fdata-sections | |
3010 | ||
3011 | Place each function or data item into its own section in the output | |
3012 | file if the target supports arbitrary sections. The name of the | |
3013 | function or the name of the data item determines the section's name | |
3014 | in the output file. | |
3015 | ||
3016 | Use these options on systems where the linker can perform optimizations to | |
3017 | improve locality of reference in the instruction space. Most systems using the | |
3018 | ELF object format have linkers with such optimizations. On AIX, the linker | |
3019 | rearranges sections (CSECTs) based on the call graph. The performance impact | |
3020 | varies. | |
3021 | ||
3022 | Together with a linker garbage collection (linker :option:`--gc-sections` | |
3023 | option) these options may lead to smaller statically-linked executables (after | |
3024 | stripping). | |
3025 | ||
3026 | On ELF/DWARF systems these options do not degenerate the quality of the debug | |
3027 | information. There could be issues with other object files/debug info formats. | |
3028 | ||
3029 | Only use these options when there are significant benefits from doing so. When | |
3030 | you specify these options, the assembler and linker create larger object and | |
3031 | executable files and are also slower. These options affect code generation. | |
3032 | They prevent optimizations by the compiler and assembler using relative | |
3033 | locations inside a translation unit since the locations are unknown until | |
3034 | link time. An example of such an optimization is relaxing calls to short call | |
3035 | instructions. | |
3036 | ||
3037 | .. option:: -fstdarg-opt | |
3038 | ||
3039 | Optimize the prologue of variadic argument functions with respect to usage of | |
3040 | those arguments. | |
3041 | ||
3042 | .. option:: -fsection-anchors | |
3043 | ||
3044 | Try to reduce the number of symbolic address calculations by using | |
3045 | shared 'anchor' symbols to address nearby objects. This transformation | |
3046 | can help to reduce the number of GOT entries and GOT accesses on some | |
3047 | targets. | |
3048 | ||
3049 | For example, the implementation of the following function ``foo`` : | |
3050 | ||
3051 | .. code-block:: c++ | |
3052 | ||
3053 | static int a, b, c; | |
3054 | int foo (void) { return a + b + c; } | |
3055 | ||
3056 | usually calculates the addresses of all three variables, but if you | |
3057 | compile it with :option:`-fsection-anchors`, it accesses the variables | |
3058 | from a common anchor point instead. The effect is similar to the | |
3059 | following pseudocode (which isn't valid C): | |
3060 | ||
3061 | .. code-block:: c++ | |
3062 | ||
3063 | int foo (void) | |
3064 | { | |
3065 | register int *xr = &x; | |
3066 | return xr[&a - &x] + xr[&b - &x] + xr[&c - &x]; | |
3067 | } | |
3068 | ||
3069 | Not all targets support this option. | |
3070 | ||
3071 | .. option:: -fzero-call-used-regs={choice} | |
3072 | ||
3073 | Zero call-used registers at function return to increase program | |
3074 | security by either mitigating Return-Oriented Programming (ROP) | |
3075 | attacks or preventing information leakage through registers. | |
3076 | ||
3077 | The possible values of :samp:`{choice}` are the same as for the | |
3078 | ``zero_call_used_regs`` attribute (see :ref:`function-attributes`). | |
3079 | The default is :samp:`skip`. | |
3080 | ||
3081 | You can control this behavior for a specific function by using the function | |
3082 | attribute ``zero_call_used_regs`` (see :ref:`function-attributes`). | |
3083 | ||
3084 | .. option:: --param {name}={value} | |
3085 | ||
3086 | In some places, GCC uses various constants to control the amount of | |
3087 | optimization that is done. For example, GCC does not inline functions | |
3088 | that contain more than a certain number of instructions. You can | |
3089 | control some of these constants on the command line using the | |
3090 | :option:`--param` option. | |
3091 | ||
3092 | The names of specific parameters, and the meaning of the values, are | |
3093 | tied to the internals of the compiler, and are subject to change | |
3094 | without notice in future releases. | |
3095 | ||
3096 | In order to get minimal, maximal and default value of a parameter, | |
3097 | one can use :option:`--help=param -Q` options. | |
3098 | ||
3099 | In each case, the :samp:`{value}` is an integer. The following choices | |
3100 | of :samp:`{name}` are recognized for all targets: | |
3101 | ||
3102 | .. gcc-param:: predictable-branch-outcome | |
3103 | ||
3104 | When branch is predicted to be taken with probability lower than this threshold | |
3105 | (in percent), then it is considered well predictable. | |
3106 | ||
3107 | .. gcc-param:: max-rtl-if-conversion-insns | |
3108 | ||
3109 | RTL if-conversion tries to remove conditional branches around a block and | |
3110 | replace them with conditionally executed instructions. This parameter | |
3111 | gives the maximum number of instructions in a block which should be | |
3112 | considered for if-conversion. The compiler will | |
3113 | also use other heuristics to decide whether if-conversion is likely to be | |
3114 | profitable. | |
3115 | ||
3116 | .. gcc-param:: max-rtl-if-conversion-predictable-cost | |
3117 | ||
3118 | RTL if-conversion will try to remove conditional branches around a block | |
3119 | and replace them with conditionally executed instructions. These parameters | |
3120 | give the maximum permissible cost for the sequence that would be generated | |
3121 | by if-conversion depending on whether the branch is statically determined | |
3122 | to be predictable or not. The units for this parameter are the same as | |
3123 | those for the GCC internal seq_cost metric. The compiler will try to | |
3124 | provide a reasonable default for this parameter using the BRANCH_COST | |
3125 | target macro. | |
3126 | ||
3127 | .. gcc-param:: max-crossjump-edges | |
3128 | ||
3129 | The maximum number of incoming edges to consider for cross-jumping. | |
3130 | The algorithm used by :option:`-fcrossjumping` is O(N^2) in | |
3131 | the number of edges incoming to each block. Increasing values mean | |
3132 | more aggressive optimization, making the compilation time increase with | |
3133 | probably small improvement in executable size. | |
3134 | ||
3135 | .. gcc-param:: min-crossjump-insns | |
3136 | ||
3137 | The minimum number of instructions that must be matched at the end | |
3138 | of two blocks before cross-jumping is performed on them. This | |
3139 | value is ignored in the case where all instructions in the block being | |
3140 | cross-jumped from are matched. | |
3141 | ||
3142 | .. gcc-param:: max-grow-copy-bb-insns | |
3143 | ||
3144 | The maximum code size expansion factor when copying basic blocks | |
3145 | instead of jumping. The expansion is relative to a jump instruction. | |
3146 | ||
3147 | .. gcc-param:: max-goto-duplication-insns | |
3148 | ||
3149 | The maximum number of instructions to duplicate to a block that jumps | |
3150 | to a computed goto. To avoid O(N^2) behavior in a number of | |
3151 | passes, GCC factors computed gotos early in the compilation process, | |
3152 | and unfactors them as late as possible. Only computed jumps at the | |
3153 | end of a basic blocks with no more than max-goto-duplication-insns are | |
3154 | unfactored. | |
3155 | ||
3156 | .. gcc-param:: max-delay-slot-insn-search | |
3157 | ||
3158 | The maximum number of instructions to consider when looking for an | |
3159 | instruction to fill a delay slot. If more than this arbitrary number of | |
3160 | instructions are searched, the time savings from filling the delay slot | |
3161 | are minimal, so stop searching. Increasing values mean more | |
3162 | aggressive optimization, making the compilation time increase with probably | |
3163 | small improvement in execution time. | |
3164 | ||
3165 | .. gcc-param:: max-delay-slot-live-search | |
3166 | ||
3167 | When trying to fill delay slots, the maximum number of instructions to | |
3168 | consider when searching for a block with valid live register | |
3169 | information. Increasing this arbitrarily chosen value means more | |
3170 | aggressive optimization, increasing the compilation time. This parameter | |
3171 | should be removed when the delay slot code is rewritten to maintain the | |
3172 | control-flow graph. | |
3173 | ||
3174 | .. gcc-param:: max-gcse-memory | |
3175 | ||
3176 | The approximate maximum amount of memory in ``kB`` that can be allocated in | |
3177 | order to perform the global common subexpression elimination | |
3178 | optimization. If more memory than specified is required, the | |
3179 | optimization is not done. | |
3180 | ||
3181 | .. gcc-param:: max-gcse-insertion-ratio | |
3182 | ||
3183 | If the ratio of expression insertions to deletions is larger than this value | |
3184 | for any expression, then RTL PRE inserts or removes the expression and thus | |
3185 | leaves partially redundant computations in the instruction stream. | |
3186 | ||
3187 | .. gcc-param:: max-pending-list-length | |
3188 | ||
3189 | The maximum number of pending dependencies scheduling allows | |
3190 | before flushing the current state and starting over. Large functions | |
3191 | with few branches or calls can create excessively large lists which | |
3192 | needlessly consume memory and resources. | |
3193 | ||
3194 | .. gcc-param:: max-modulo-backtrack-attempts | |
3195 | ||
3196 | The maximum number of backtrack attempts the scheduler should make | |
3197 | when modulo scheduling a loop. Larger values can exponentially increase | |
3198 | compilation time. | |
3199 | ||
3200 | .. gcc-param:: max-inline-functions-called-once-loop-depth | |
3201 | ||
3202 | Maximal loop depth of a call considered by inline heuristics that tries to | |
3203 | inline all functions called once. | |
3204 | ||
3205 | .. gcc-param:: max-inline-functions-called-once-insns | |
3206 | ||
3207 | Maximal estimated size of functions produced while inlining functions called | |
3208 | once. | |
3209 | ||
3210 | .. gcc-param:: max-inline-insns-single | |
3211 | ||
3212 | Several parameters control the tree inliner used in GCC. This number sets the | |
3213 | maximum number of instructions (counted in GCC's internal representation) in a | |
3214 | single function that the tree inliner considers for inlining. This only | |
3215 | affects functions declared inline and methods implemented in a class | |
3216 | declaration (C++). | |
3217 | ||
3218 | .. gcc-param:: max-inline-insns-auto | |
3219 | ||
3220 | When you use :option:`-finline-functions` (included in :option:`-O3`), | |
3221 | a lot of functions that would otherwise not be considered for inlining | |
3222 | by the compiler are investigated. To those functions, a different | |
3223 | (more restrictive) limit compared to functions declared inline can | |
3224 | be applied (:option:`--param` :gcc-param:`max-inline-insns-auto`). | |
3225 | ||
3226 | .. gcc-param:: max-inline-insns-small | |
3227 | ||
3228 | This is bound applied to calls which are considered relevant with | |
3229 | :option:`-finline-small-functions`. | |
3230 | ||
3231 | .. gcc-param:: max-inline-insns-size | |
3232 | ||
3233 | This is bound applied to calls which are optimized for size. Small growth | |
3234 | may be desirable to anticipate optimization oppurtunities exposed by inlining. | |
3235 | ||
3236 | .. gcc-param:: uninlined-function-insns | |
3237 | ||
3238 | Number of instructions accounted by inliner for function overhead such as | |
3239 | function prologue and epilogue. | |
3240 | ||
3241 | .. gcc-param:: uninlined-function-time | |
3242 | ||
3243 | Extra time accounted by inliner for function overhead such as time needed to | |
3244 | execute function prologue and epilogue. | |
3245 | ||
3246 | .. gcc-param:: inline-heuristics-hint-percent | |
3247 | ||
3248 | The scale (in percents) applied to inline-insns-single, | |
3249 | inline-insns-single-O2, inline-insns-auto | |
3250 | when inline heuristics hints that inlining is | |
3251 | very profitable (will enable later optimizations). | |
3252 | ||
3253 | .. gcc-param:: uninlined-thunk-insns | |
3254 | uninlined-thunk-time | |
3255 | ||
3256 | Same as :option:`--param` :gcc-param:`uninlined-function-insns` and | |
3257 | :option:`--param` :gcc-param:`uninlined-function-time` but applied to function thunks. | |
3258 | ||
3259 | .. gcc-param:: inline-min-speedup | |
3260 | ||
3261 | When estimated performance improvement of caller + callee runtime exceeds this | |
3262 | threshold (in percent), the function can be inlined regardless of the limit on | |
3263 | :option:`--param` :gcc-param:`max-inline-insns-single` and :option:`--param` | |
3264 | :gcc-param:`max-inline-insns-auto`. | |
3265 | ||
3266 | .. gcc-param:: large-function-insns | |
3267 | ||
3268 | The limit specifying really large functions. For functions larger than this | |
3269 | limit after inlining, inlining is constrained by | |
3270 | :option:`--param` :gcc-param:`large-function-growth`. This parameter is useful primarily | |
3271 | to avoid extreme compilation time caused by non-linear algorithms used by the | |
3272 | back end. | |
3273 | ||
3274 | .. gcc-param:: large-function-growth | |
3275 | ||
3276 | Specifies maximal growth of large function caused by inlining in percents. | |
3277 | For example, parameter value 100 limits large function growth to 2.0 times | |
3278 | the original size. | |
3279 | ||
3280 | .. gcc-param:: large-unit-insns | |
3281 | ||
3282 | The limit specifying large translation unit. Growth caused by inlining of | |
3283 | units larger than this limit is limited by :option:`--param` :gcc-param:`inline-unit-growth`. | |
3284 | For small units this might be too tight. | |
3285 | For example, consider a unit consisting of function A | |
3286 | that is inline and B that just calls A three times. If B is small relative to | |
3287 | A, the growth of unit is 300\% and yet such inlining is very sane. For very | |
3288 | large units consisting of small inlineable functions, however, the overall unit | |
3289 | growth limit is needed to avoid exponential explosion of code size. Thus for | |
3290 | smaller units, the size is increased to :option:`--param` :gcc-param:`large-unit-insns` | |
3291 | before applying :option:`--param` :gcc-param:`inline-unit-growth`. | |
3292 | ||
3293 | .. gcc-param:: lazy-modules | |
3294 | ||
3295 | Maximum number of concurrently open C++ module files when lazy loading. | |
3296 | ||
3297 | .. gcc-param:: inline-unit-growth | |
3298 | ||
3299 | Specifies maximal overall growth of the compilation unit caused by inlining. | |
3300 | For example, parameter value 20 limits unit growth to 1.2 times the original | |
3301 | size. Cold functions (either marked cold via an attribute or by profile | |
3302 | feedback) are not accounted into the unit size. | |
3303 | ||
3304 | .. gcc-param:: ipa-cp-unit-growth | |
3305 | ||
3306 | Specifies maximal overall growth of the compilation unit caused by | |
3307 | interprocedural constant propagation. For example, parameter value 10 limits | |
3308 | unit growth to 1.1 times the original size. | |
3309 | ||
3310 | .. gcc-param:: ipa-cp-large-unit-insns | |
3311 | ||
3312 | The size of translation unit that IPA-CP pass considers large. | |
3313 | ||
3314 | .. gcc-param:: large-stack-frame | |
3315 | ||
3316 | The limit specifying large stack frames. While inlining the algorithm is trying | |
3317 | to not grow past this limit too much. | |
3318 | ||
3319 | .. gcc-param:: large-stack-frame-growth | |
3320 | ||
3321 | Specifies maximal growth of large stack frames caused by inlining in percents. | |
3322 | For example, parameter value 1000 limits large stack frame growth to 11 times | |
3323 | the original size. | |
3324 | ||
3325 | .. gcc-param:: max-inline-insns-recursive | |
3326 | max-inline-insns-recursive-auto | |
3327 | ||
3328 | Specifies the maximum number of instructions an out-of-line copy of a | |
3329 | self-recursive inline | |
3330 | function can grow into by performing recursive inlining. | |
3331 | ||
3332 | :option:`--param` :gcc-param:`max-inline-insns-recursive` applies to functions | |
3333 | declared inline. | |
3334 | For functions not declared inline, recursive inlining | |
3335 | happens only when :option:`-finline-functions` (included in :option:`-O3`) is | |
3336 | enabled; :option:`--param` :gcc-param:`max-inline-insns-recursive-auto` applies instead. | |
3337 | ||
3338 | .. gcc-param:: max-inline-recursive-depth | |
3339 | max-inline-recursive-depth-auto | |
3340 | ||
3341 | Specifies the maximum recursion depth used for recursive inlining. | |
3342 | ||
3343 | :option:`--param` :gcc-param:`max-inline-recursive-depth` applies to functions | |
3344 | declared inline. For functions not declared inline, recursive inlining | |
3345 | happens only when :option:`-finline-functions` (included in :option:`-O3`) is | |
3346 | enabled; :option:`--param` :gcc-param:`max-inline-recursive-depth-auto` applies instead. | |
3347 | ||
3348 | .. gcc-param:: min-inline-recursive-probability | |
3349 | ||
3350 | Recursive inlining is profitable only for function having deep recursion | |
3351 | in average and can hurt for function having little recursion depth by | |
3352 | increasing the prologue size or complexity of function body to other | |
3353 | optimizers. | |
3354 | ||
3355 | When profile feedback is available (see :option:`-fprofile-generate`) the actual | |
3356 | recursion depth can be guessed from the probability that function recurses | |
3357 | via a given call expression. This parameter limits inlining only to call | |
3358 | expressions whose probability exceeds the given threshold (in percents). | |
3359 | ||
3360 | .. gcc-param:: early-inlining-insns | |
3361 | ||
3362 | Specify growth that the early inliner can make. In effect it increases | |
3363 | the amount of inlining for code having a large abstraction penalty. | |
3364 | ||
3365 | .. gcc-param:: max-early-inliner-iterations | |
3366 | ||
3367 | Limit of iterations of the early inliner. This basically bounds | |
3368 | the number of nested indirect calls the early inliner can resolve. | |
3369 | Deeper chains are still handled by late inlining. | |
3370 | ||
3371 | .. gcc-param:: comdat-sharing-probability | |
3372 | ||
3373 | Probability (in percent) that C++ inline function with comdat visibility | |
3374 | are shared across multiple compilation units. | |
3375 | ||
3376 | .. gcc-param:: modref-max-bases | |
3377 | modref-max-refs | |
3378 | modref-max-accesses | |
3379 | ||
3380 | Specifies the maximal number of base pointers, references and accesses stored | |
3381 | for a single function by mod/ref analysis. | |
3382 | ||
3383 | .. gcc-param:: modref-max-tests | |
3384 | ||
3385 | Specifies the maxmal number of tests alias oracle can perform to disambiguate | |
3386 | memory locations using the mod/ref information. This parameter ought to be | |
3387 | bigger than :option:`--param` :gcc-param:`modref-max-bases` and :option:`--param | |
3388 | :gcc-param:`modref-max-refs`. | |
3389 | ||
3390 | .. gcc-param:: modref-max-depth | |
3391 | ||
3392 | Specifies the maximum depth of DFS walk used by modref escape analysis. | |
3393 | Setting to 0 disables the analysis completely. | |
3394 | ||
3395 | .. gcc-param:: modref-max-escape-points | |
3396 | ||
3397 | Specifies the maximum number of escape points tracked by modref per SSA-name. | |
3398 | ||
3399 | .. gcc-param:: modref-max-adjustments | |
3400 | ||
3401 | Specifies the maximum number the access range is enlarged during modref dataflow | |
3402 | analysis. | |
3403 | ||
3404 | .. gcc-param:: profile-func-internal-id | |
3405 | ||
3406 | A parameter to control whether to use function internal id in profile | |
3407 | database lookup. If the value is 0, the compiler uses an id that | |
3408 | is based on function assembler name and filename, which makes old profile | |
3409 | data more tolerant to source changes such as function reordering etc. | |
3410 | ||
3411 | .. gcc-param:: min-vect-loop-bound | |
3412 | ||
3413 | The minimum number of iterations under which loops are not vectorized | |
3414 | when :option:`-ftree-vectorize` is used. The number of iterations after | |
3415 | vectorization needs to be greater than the value specified by this option | |
3416 | to allow vectorization. | |
3417 | ||
3418 | .. gcc-param:: gcse-cost-distance-ratio | |
3419 | ||
3420 | Scaling factor in calculation of maximum distance an expression | |
3421 | can be moved by GCSE optimizations. This is currently supported only in the | |
3422 | code hoisting pass. The bigger the ratio, the more aggressive code hoisting | |
3423 | is with simple expressions, i.e., the expressions that have cost | |
3424 | less than gcse-unrestricted-cost. Specifying 0 disables | |
3425 | hoisting of simple expressions. | |
3426 | ||
3427 | .. gcc-param:: gcse-unrestricted-cost | |
3428 | ||
3429 | Cost, roughly measured as the cost of a single typical machine | |
3430 | instruction, at which GCSE optimizations do not constrain | |
3431 | the distance an expression can travel. This is currently | |
3432 | supported only in the code hoisting pass. The lesser the cost, | |
3433 | the more aggressive code hoisting is. Specifying 0 | |
3434 | allows all expressions to travel unrestricted distances. | |
3435 | ||
3436 | .. gcc-param:: max-hoist-depth | |
3437 | ||
3438 | The depth of search in the dominator tree for expressions to hoist. | |
3439 | This is used to avoid quadratic behavior in hoisting algorithm. | |
3440 | The value of 0 does not limit on the search, but may slow down compilation | |
3441 | of huge functions. | |
3442 | ||
3443 | .. gcc-param:: max-tail-merge-comparisons | |
3444 | ||
3445 | The maximum amount of similar bbs to compare a bb with. This is used to | |
3446 | avoid quadratic behavior in tree tail merging. | |
3447 | ||
3448 | .. gcc-param:: max-tail-merge-iterations | |
3449 | ||
3450 | The maximum amount of iterations of the pass over the function. This is used to | |
3451 | limit compilation time in tree tail merging. | |
3452 | ||
3453 | .. gcc-param:: store-merging-allow-unaligned | |
3454 | ||
3455 | Allow the store merging pass to introduce unaligned stores if it is legal to | |
3456 | do so. | |
3457 | ||
3458 | .. gcc-param:: max-stores-to-merge | |
3459 | ||
3460 | The maximum number of stores to attempt to merge into wider stores in the store | |
3461 | merging pass. | |
3462 | ||
3463 | .. gcc-param:: max-store-chains-to-track | |
3464 | ||
3465 | The maximum number of store chains to track at the same time in the attempt | |
3466 | to merge them into wider stores in the store merging pass. | |
3467 | ||
3468 | .. gcc-param:: max-stores-to-track | |
3469 | ||
3470 | The maximum number of stores to track at the same time in the attemt to | |
3471 | to merge them into wider stores in the store merging pass. | |
3472 | ||
3473 | .. gcc-param:: max-unrolled-insns | |
3474 | ||
3475 | The maximum number of instructions that a loop may have to be unrolled. | |
3476 | If a loop is unrolled, this parameter also determines how many times | |
3477 | the loop code is unrolled. | |
3478 | ||
3479 | .. gcc-param:: max-average-unrolled-insns | |
3480 | ||
3481 | The maximum number of instructions biased by probabilities of their execution | |
3482 | that a loop may have to be unrolled. If a loop is unrolled, | |
3483 | this parameter also determines how many times the loop code is unrolled. | |
3484 | ||
3485 | .. gcc-param:: max-unroll-times | |
3486 | ||
3487 | The maximum number of unrollings of a single loop. | |
3488 | ||
3489 | .. gcc-param:: max-peeled-insns | |
3490 | ||
3491 | The maximum number of instructions that a loop may have to be peeled. | |
3492 | If a loop is peeled, this parameter also determines how many times | |
3493 | the loop code is peeled. | |
3494 | ||
3495 | .. gcc-param:: max-peel-times | |
3496 | ||
3497 | The maximum number of peelings of a single loop. | |
3498 | ||
3499 | .. gcc-param:: max-peel-branches | |
3500 | ||
3501 | The maximum number of branches on the hot path through the peeled sequence. | |
3502 | ||
3503 | .. gcc-param:: max-completely-peeled-insns | |
3504 | ||
3505 | The maximum number of insns of a completely peeled loop. | |
3506 | ||
3507 | .. gcc-param:: max-completely-peel-times | |
3508 | ||
3509 | The maximum number of iterations of a loop to be suitable for complete peeling. | |
3510 | ||
3511 | .. gcc-param:: max-completely-peel-loop-nest-depth | |
3512 | ||
3513 | The maximum depth of a loop nest suitable for complete peeling. | |
3514 | ||
3515 | .. gcc-param:: max-unswitch-insns | |
3516 | ||
3517 | The maximum number of insns of an unswitched loop. | |
3518 | ||
3519 | .. gcc-param:: lim-expensive | |
3520 | ||
3521 | The minimum cost of an expensive expression in the loop invariant motion. | |
3522 | ||
3523 | .. gcc-param:: min-loop-cond-split-prob | |
3524 | ||
3525 | When FDO profile information is available, min-loop-cond-split-prob | |
3526 | specifies minimum threshold for probability of semi-invariant condition | |
3527 | statement to trigger loop split. | |
3528 | ||
3529 | .. gcc-param:: iv-consider-all-candidates-bound | |
3530 | ||
3531 | Bound on number of candidates for induction variables, below which | |
3532 | all candidates are considered for each use in induction variable | |
3533 | optimizations. If there are more candidates than this, | |
3534 | only the most relevant ones are considered to avoid quadratic time complexity. | |
3535 | ||
3536 | .. gcc-param:: iv-max-considered-uses | |
3537 | ||
3538 | The induction variable optimizations give up on loops that contain more | |
3539 | induction variable uses. | |
3540 | ||
3541 | .. gcc-param:: iv-always-prune-cand-set-bound | |
3542 | ||
3543 | If the number of candidates in the set is smaller than this value, | |
3544 | always try to remove unnecessary ivs from the set | |
3545 | when adding a new one. | |
3546 | ||
3547 | .. gcc-param:: avg-loop-niter | |
3548 | ||
3549 | Average number of iterations of a loop. | |
3550 | ||
3551 | .. gcc-param:: dse-max-object-size | |
3552 | ||
3553 | Maximum size (in bytes) of objects tracked bytewise by dead store elimination. | |
3554 | Larger values may result in larger compilation times. | |
3555 | ||
3556 | .. gcc-param:: dse-max-alias-queries-per-store | |
3557 | ||
3558 | Maximum number of queries into the alias oracle per store. | |
3559 | Larger values result in larger compilation times and may result in more | |
3560 | removed dead stores. | |
3561 | ||
3562 | .. gcc-param:: scev-max-expr-size | |
3563 | ||
3564 | Bound on size of expressions used in the scalar evolutions analyzer. | |
3565 | Large expressions slow the analyzer. | |
3566 | ||
3567 | .. gcc-param:: scev-max-expr-complexity | |
3568 | ||
3569 | Bound on the complexity of the expressions in the scalar evolutions analyzer. | |
3570 | Complex expressions slow the analyzer. | |
3571 | ||
3572 | .. gcc-param:: max-tree-if-conversion-phi-args | |
3573 | ||
3574 | Maximum number of arguments in a PHI supported by TREE if conversion | |
3575 | unless the loop is marked with simd pragma. | |
3576 | ||
3577 | .. gcc-param:: vect-max-layout-candidates | |
3578 | ||
3579 | The maximum number of possible vector layouts (such as permutations) | |
3580 | to consider when optimizing to-be-vectorized code. | |
3581 | ||
3582 | .. gcc-param:: vect-max-version-for-alignment-checks | |
3583 | ||
3584 | The maximum number of run-time checks that can be performed when | |
3585 | doing loop versioning for alignment in the vectorizer. | |
3586 | ||
3587 | .. gcc-param:: vect-max-version-for-alias-checks | |
3588 | ||
3589 | The maximum number of run-time checks that can be performed when | |
3590 | doing loop versioning for alias in the vectorizer. | |
3591 | ||
3592 | .. gcc-param:: vect-max-peeling-for-alignment | |
3593 | ||
3594 | The maximum number of loop peels to enhance access alignment | |
3595 | for vectorizer. Value -1 means no limit. | |
3596 | ||
3597 | .. gcc-param:: max-iterations-to-track | |
3598 | ||
3599 | The maximum number of iterations of a loop the brute-force algorithm | |
3600 | for analysis of the number of iterations of the loop tries to evaluate. | |
3601 | ||
3602 | .. gcc-param:: hot-bb-count-fraction | |
3603 | ||
3604 | The denominator n of fraction 1/n of the maximal execution count of a | |
3605 | basic block in the entire program that a basic block needs to at least | |
3606 | have in order to be considered hot. The default is 10000, which means | |
3607 | that a basic block is considered hot if its execution count is greater | |
3608 | than 1/10000 of the maximal execution count. 0 means that it is never | |
3609 | considered hot. Used in non-LTO mode. | |
3610 | ||
3611 | .. gcc-param:: hot-bb-count-ws-permille | |
3612 | ||
3613 | The number of most executed permilles, ranging from 0 to 1000, of the | |
3614 | profiled execution of the entire program to which the execution count | |
3615 | of a basic block must be part of in order to be considered hot. The | |
3616 | default is 990, which means that a basic block is considered hot if | |
3617 | its execution count contributes to the upper 990 permilles, or 99.0%, | |
3618 | of the profiled execution of the entire program. 0 means that it is | |
3619 | never considered hot. Used in LTO mode. | |
3620 | ||
3621 | .. gcc-param:: hot-bb-frequency-fraction | |
3622 | ||
3623 | The denominator n of fraction 1/n of the execution frequency of the | |
3624 | entry block of a function that a basic block of this function needs | |
3625 | to at least have in order to be considered hot. The default is 1000, | |
3626 | which means that a basic block is considered hot in a function if it | |
3627 | is executed more frequently than 1/1000 of the frequency of the entry | |
3628 | block of the function. 0 means that it is never considered hot. | |
3629 | ||
3630 | .. gcc-param:: unlikely-bb-count-fraction | |
3631 | ||
3632 | The denominator n of fraction 1/n of the number of profiled runs of | |
3633 | the entire program below which the execution count of a basic block | |
3634 | must be in order for the basic block to be considered unlikely executed. | |
3635 | The default is 20, which means that a basic block is considered unlikely | |
3636 | executed if it is executed in fewer than 1/20, or 5%, of the runs of | |
3637 | the program. 0 means that it is always considered unlikely executed. | |
3638 | ||
3639 | .. gcc-param:: max-predicted-iterations | |
3640 | ||
3641 | The maximum number of loop iterations we predict statically. This is useful | |
3642 | in cases where a function contains a single loop with known bound and | |
3643 | another loop with unknown bound. | |
3644 | The known number of iterations is predicted correctly, while | |
3645 | the unknown number of iterations average to roughly 10. This means that the | |
3646 | loop without bounds appears artificially cold relative to the other one. | |
3647 | ||
3648 | .. gcc-param:: builtin-expect-probability | |
3649 | ||
3650 | Control the probability of the expression having the specified value. This | |
3651 | parameter takes a percentage (i.e. 0 ... 100) as input. | |
3652 | ||
3653 | .. gcc-param:: builtin-string-cmp-inline-length | |
3654 | ||
3655 | The maximum length of a constant string for a builtin string cmp call | |
3656 | eligible for inlining. | |
3657 | ||
3658 | .. gcc-param:: align-threshold | |
3659 | ||
3660 | Select fraction of the maximal frequency of executions of a basic block in | |
3661 | a function to align the basic block. | |
3662 | ||
3663 | .. gcc-param:: align-loop-iterations | |
3664 | ||
3665 | A loop expected to iterate at least the selected number of iterations is | |
3666 | aligned. | |
3667 | ||
3668 | .. gcc-param:: tracer-dynamic-coverage | |
3669 | tracer-dynamic-coverage-feedback | |
3670 | ||
3671 | This value is used to limit superblock formation once the given percentage of | |
3672 | executed instructions is covered. This limits unnecessary code size | |
3673 | expansion. | |
3674 | ||
3675 | The tracer-dynamic-coverage-feedback parameter | |
3676 | is used only when profile | |
3677 | feedback is available. The real profiles (as opposed to statically estimated | |
3678 | ones) are much less balanced allowing the threshold to be larger value. | |
3679 | ||
3680 | .. gcc-param:: tracer-max-code-growth | |
3681 | ||
3682 | Stop tail duplication once code growth has reached given percentage. This is | |
3683 | a rather artificial limit, as most of the duplicates are eliminated later in | |
3684 | cross jumping, so it may be set to much higher values than is the desired code | |
3685 | growth. | |
3686 | ||
3687 | .. gcc-param:: tracer-min-branch-ratio | |
3688 | ||
3689 | Stop reverse growth when the reverse probability of best edge is less than this | |
3690 | threshold (in percent). | |
3691 | ||
3692 | .. gcc-param:: tracer-min-branch-probability | |
3693 | tracer-min-branch-probability-feedback | |
3694 | ||
3695 | Stop forward growth if the best edge has probability lower than this | |
3696 | threshold. | |
3697 | ||
3698 | Similarly to tracer-dynamic-coverage two parameters are | |
3699 | provided. tracer-min-branch-probability-feedback is used for | |
3700 | compilation with profile feedback and tracer-min-branch-probability | |
3701 | compilation without. The value for compilation with profile feedback | |
3702 | needs to be more conservative (higher) in order to make tracer | |
3703 | effective. | |
3704 | ||
3705 | .. gcc-param:: stack-clash-protection-guard-size | |
3706 | ||
3707 | Specify the size of the operating system provided stack guard as | |
3708 | 2 raised to :samp:`{num}` bytes. Higher values may reduce the | |
3709 | number of explicit probes, but a value larger than the operating system | |
3710 | provided guard will leave code vulnerable to stack clash style attacks. | |
3711 | ||
3712 | .. gcc-param:: stack-clash-protection-probe-interval | |
3713 | ||
3714 | Stack clash protection involves probing stack space as it is allocated. This | |
3715 | param controls the maximum distance between probes into the stack as 2 raised | |
3716 | to :samp:`{num}` bytes. Higher values may reduce the number of explicit probes, but a value | |
3717 | larger than the operating system provided guard will leave code vulnerable to | |
3718 | stack clash style attacks. | |
3719 | ||
3720 | .. gcc-param:: max-cse-path-length | |
3721 | ||
3722 | The maximum number of basic blocks on path that CSE considers. | |
3723 | ||
3724 | .. gcc-param:: max-cse-insns | |
3725 | ||
3726 | The maximum number of instructions CSE processes before flushing. | |
3727 | ||
3728 | .. gcc-param:: ggc-min-expand | |
3729 | ||
3730 | GCC uses a garbage collector to manage its own memory allocation. This | |
3731 | parameter specifies the minimum percentage by which the garbage | |
3732 | collector's heap should be allowed to expand between collections. | |
3733 | Tuning this may improve compilation speed; it has no effect on code | |
3734 | generation. | |
3735 | ||
3736 | The default is 30% + 70% \* (RAM/1GB) with an upper bound of 100% when | |
3737 | RAM >= 1GB. If ``getrlimit`` is available, the notion of 'RAM' is | |
3738 | the smallest of actual RAM and ``RLIMIT_DATA`` or ``RLIMIT_AS``. If | |
3739 | GCC is not able to calculate RAM on a particular platform, the lower | |
3740 | bound of 30% is used. Setting this parameter and | |
3741 | ggc-min-heapsize to zero causes a full collection to occur at | |
3742 | every opportunity. This is extremely slow, but can be useful for | |
3743 | debugging. | |
3744 | ||
3745 | .. gcc-param:: ggc-min-heapsize | |
3746 | ||
3747 | Minimum size of the garbage collector's heap before it begins bothering | |
3748 | to collect garbage. The first collection occurs after the heap expands | |
3749 | by ggc-min-expand % beyond ggc-min-heapsize. Again, | |
3750 | tuning this may improve compilation speed, and has no effect on code | |
3751 | generation. | |
3752 | ||
3753 | The default is the smaller of RAM/8, RLIMIT_RSS, or a limit that | |
3754 | tries to ensure that RLIMIT_DATA or RLIMIT_AS are not exceeded, but | |
3755 | with a lower bound of 4096 (four megabytes) and an upper bound of | |
3756 | 131072 (128 megabytes). If GCC is not able to calculate RAM on a | |
3757 | particular platform, the lower bound is used. Setting this parameter | |
3758 | very large effectively disables garbage collection. Setting this | |
3759 | parameter and ggc-min-expand to zero causes a full collection | |
3760 | to occur at every opportunity. | |
3761 | ||
3762 | .. gcc-param:: max-reload-search-insns | |
3763 | ||
3764 | The maximum number of instruction reload should look backward for equivalent | |
3765 | register. Increasing values mean more aggressive optimization, making the | |
3766 | compilation time increase with probably slightly better performance. | |
3767 | ||
3768 | .. gcc-param:: max-cselib-memory-locations | |
3769 | ||
3770 | The maximum number of memory locations cselib should take into account. | |
3771 | Increasing values mean more aggressive optimization, making the compilation time | |
3772 | increase with probably slightly better performance. | |
3773 | ||
3774 | .. gcc-param:: max-sched-ready-insns | |
3775 | ||
3776 | The maximum number of instructions ready to be issued the scheduler should | |
3777 | consider at any given time during the first scheduling pass. Increasing | |
3778 | values mean more thorough searches, making the compilation time increase | |
3779 | with probably little benefit. | |
3780 | ||
3781 | .. gcc-param:: max-sched-region-blocks | |
3782 | ||
3783 | The maximum number of blocks in a region to be considered for | |
3784 | interblock scheduling. | |
3785 | ||
3786 | .. gcc-param:: max-pipeline-region-blocks | |
3787 | ||
3788 | The maximum number of blocks in a region to be considered for | |
3789 | pipelining in the selective scheduler. | |
3790 | ||
3791 | .. gcc-param:: max-sched-region-insns | |
3792 | ||
3793 | The maximum number of insns in a region to be considered for | |
3794 | interblock scheduling. | |
3795 | ||
3796 | .. gcc-param:: max-pipeline-region-insns | |
3797 | ||
3798 | The maximum number of insns in a region to be considered for | |
3799 | pipelining in the selective scheduler. | |
3800 | ||
3801 | .. gcc-param:: min-spec-prob | |
3802 | ||
3803 | The minimum probability (in percents) of reaching a source block | |
3804 | for interblock speculative scheduling. | |
3805 | ||
3806 | .. gcc-param:: max-sched-extend-regions-iters | |
3807 | ||
3808 | The maximum number of iterations through CFG to extend regions. | |
3809 | A value of 0 disables region extensions. | |
3810 | ||
3811 | .. gcc-param:: max-sched-insn-conflict-delay | |
3812 | ||
3813 | The maximum conflict delay for an insn to be considered for speculative motion. | |
3814 | ||
3815 | .. gcc-param:: sched-spec-prob-cutoff | |
3816 | ||
3817 | The minimal probability of speculation success (in percents), so that | |
3818 | speculative insns are scheduled. | |
3819 | ||
3820 | .. gcc-param:: sched-state-edge-prob-cutoff | |
3821 | ||
3822 | The minimum probability an edge must have for the scheduler to save its | |
3823 | state across it. | |
3824 | ||
3825 | .. gcc-param:: sched-mem-true-dep-cost | |
3826 | ||
3827 | Minimal distance (in CPU cycles) between store and load targeting same | |
3828 | memory locations. | |
3829 | ||
3830 | .. gcc-param:: selsched-max-lookahead | |
3831 | ||
3832 | The maximum size of the lookahead window of selective scheduling. It is a | |
3833 | depth of search for available instructions. | |
3834 | ||
3835 | .. gcc-param:: selsched-max-sched-times | |
3836 | ||
3837 | The maximum number of times that an instruction is scheduled during | |
3838 | selective scheduling. This is the limit on the number of iterations | |
3839 | through which the instruction may be pipelined. | |
3840 | ||
3841 | .. gcc-param:: selsched-insns-to-rename | |
3842 | ||
3843 | The maximum number of best instructions in the ready list that are considered | |
3844 | for renaming in the selective scheduler. | |
3845 | ||
3846 | .. gcc-param:: sms-min-sc | |
3847 | ||
3848 | The minimum value of stage count that swing modulo scheduler | |
3849 | generates. | |
3850 | ||
3851 | .. gcc-param:: max-last-value-rtl | |
3852 | ||
3853 | The maximum size measured as number of RTLs that can be recorded in an expression | |
3854 | in combiner for a pseudo register as last known value of that register. | |
3855 | ||
3856 | .. gcc-param:: max-combine-insns | |
3857 | ||
3858 | The maximum number of instructions the RTL combiner tries to combine. | |
3859 | ||
3860 | .. gcc-param:: integer-share-limit | |
3861 | ||
3862 | Small integer constants can use a shared data structure, reducing the | |
3863 | compiler's memory usage and increasing its speed. This sets the maximum | |
3864 | value of a shared integer constant. | |
3865 | ||
3866 | .. gcc-param:: ssp-buffer-size | |
3867 | ||
3868 | The minimum size of buffers (i.e. arrays) that receive stack smashing | |
3869 | protection when :option:`-fstack-protector` is used. | |
3870 | ||
3871 | .. gcc-param:: min-size-for-stack-sharing | |
3872 | ||
3873 | The minimum size of variables taking part in stack slot sharing when not | |
3874 | optimizing. | |
3875 | ||
3876 | .. gcc-param:: max-jump-thread-duplication-stmts | |
3877 | ||
3878 | Maximum number of statements allowed in a block that needs to be | |
3879 | duplicated when threading jumps. | |
3880 | ||
3881 | .. gcc-param:: max-jump-thread-paths | |
3882 | ||
3883 | The maximum number of paths to consider when searching for jump threading | |
3884 | opportunities. When arriving at a block, incoming edges are only considered | |
3885 | if the number of paths to be searched so far multiplied by the number of | |
3886 | incoming edges does not exhaust the specified maximum number of paths to | |
3887 | consider. | |
3888 | ||
3889 | .. gcc-param:: max-fields-for-field-sensitive | |
3890 | ||
3891 | Maximum number of fields in a structure treated in | |
3892 | a field sensitive manner during pointer analysis. | |
3893 | ||
3894 | .. gcc-param:: prefetch-latency | |
3895 | ||
3896 | Estimate on average number of instructions that are executed before | |
3897 | prefetch finishes. The distance prefetched ahead is proportional | |
3898 | to this constant. Increasing this number may also lead to less | |
3899 | streams being prefetched (see :gcc-param:`simultaneous-prefetches`). | |
3900 | ||
3901 | .. gcc-param:: simultaneous-prefetches | |
3902 | ||
3903 | Maximum number of prefetches that can run at the same time. | |
3904 | ||
3905 | .. gcc-param:: l1-cache-line-size | |
3906 | ||
3907 | The size of cache line in L1 data cache, in bytes. | |
3908 | ||
3909 | .. gcc-param:: l1-cache-size | |
3910 | ||
3911 | The size of L1 data cache, in kilobytes. | |
3912 | ||
3913 | .. gcc-param:: l2-cache-size | |
3914 | ||
3915 | The size of L2 data cache, in kilobytes. | |
3916 | ||
3917 | .. gcc-param:: prefetch-dynamic-strides | |
3918 | ||
3919 | Whether the loop array prefetch pass should issue software prefetch hints | |
3920 | for strides that are non-constant. In some cases this may be | |
3921 | beneficial, though the fact the stride is non-constant may make it | |
3922 | hard to predict when there is clear benefit to issuing these hints. | |
3923 | ||
3924 | Set to 1 if the prefetch hints should be issued for non-constant | |
3925 | strides. Set to 0 if prefetch hints should be issued only for strides that | |
3926 | are known to be constant and below prefetch-minimum-stride. | |
3927 | ||
3928 | .. gcc-param:: prefetch-minimum-stride | |
3929 | ||
3930 | Minimum constant stride, in bytes, to start using prefetch hints for. If | |
3931 | the stride is less than this threshold, prefetch hints will not be issued. | |
3932 | ||
3933 | This setting is useful for processors that have hardware prefetchers, in | |
3934 | which case there may be conflicts between the hardware prefetchers and | |
3935 | the software prefetchers. If the hardware prefetchers have a maximum | |
3936 | stride they can handle, it should be used here to improve the use of | |
3937 | software prefetchers. | |
3938 | ||
3939 | A value of -1 means we don't have a threshold and therefore | |
3940 | prefetch hints can be issued for any constant stride. | |
3941 | ||
3942 | This setting is only useful for strides that are known and constant. | |
3943 | ||
3944 | .. gcc-param:: destructive-interference-size | |
3945 | constructive-interference-size | |
3946 | ||
3947 | The values for the C++17 variables | |
3948 | ``std::hardware_destructive_interference_size`` and | |
3949 | ``std::hardware_constructive_interference_size``. The destructive | |
3950 | interference size is the minimum recommended offset between two | |
3951 | independent concurrently-accessed objects; the constructive | |
3952 | interference size is the maximum recommended size of contiguous memory | |
3953 | accessed together. Typically both will be the size of an L1 cache | |
3954 | line for the target, in bytes. For a generic target covering a range of L1 | |
3955 | cache line sizes, typically the constructive interference size will be | |
3956 | the small end of the range and the destructive size will be the large | |
3957 | end. | |
3958 | ||
3959 | The destructive interference size is intended to be used for layout, | |
3960 | and thus has ABI impact. The default value is not expected to be | |
3961 | stable, and on some targets varies with :option:`-mtune`, so use of | |
3962 | this variable in a context where ABI stability is important, such as | |
3963 | the public interface of a library, is strongly discouraged; if it is | |
3964 | used in that context, users can stabilize the value using this | |
3965 | option. | |
3966 | ||
3967 | The constructive interference size is less sensitive, as it is | |
3968 | typically only used in a :samp:`static_assert` to make sure that a type | |
3969 | fits within a cache line. | |
3970 | ||
3971 | See also :option:`-Winterference-size`. | |
3972 | ||
3973 | .. gcc-param:: loop-interchange-max-num-stmts | |
3974 | ||
3975 | The maximum number of stmts in a loop to be interchanged. | |
3976 | ||
3977 | .. gcc-param:: loop-interchange-stride-ratio | |
3978 | ||
3979 | The minimum ratio between stride of two loops for interchange to be profitable. | |
3980 | ||
3981 | .. gcc-param:: min-insn-to-prefetch-ratio | |
3982 | ||
3983 | The minimum ratio between the number of instructions and the | |
3984 | number of prefetches to enable prefetching in a loop. | |
3985 | ||
3986 | .. gcc-param:: prefetch-min-insn-to-mem-ratio | |
3987 | ||
3988 | The minimum ratio between the number of instructions and the | |
3989 | number of memory references to enable prefetching in a loop. | |
3990 | ||
3991 | .. gcc-param:: use-canonical-types | |
3992 | ||
3993 | Whether the compiler should use the 'canonical' type system. | |
3994 | Should always be 1, which uses a more efficient internal | |
3995 | mechanism for comparing types in C++ and Objective-C++. However, if | |
3996 | bugs in the canonical type system are causing compilation failures, | |
3997 | set this value to 0 to disable canonical types. | |
3998 | ||
3999 | .. gcc-param:: switch-conversion-max-branch-ratio | |
4000 | ||
4001 | Switch initialization conversion refuses to create arrays that are | |
4002 | bigger than switch-conversion-max-branch-ratio times the number of | |
4003 | branches in the switch. | |
4004 | ||
4005 | .. gcc-param:: max-partial-antic-length | |
4006 | ||
4007 | Maximum length of the partial antic set computed during the tree | |
4008 | partial redundancy elimination optimization (:option:`-ftree-pre`) when | |
4009 | optimizing at :option:`-O3` and above. For some sorts of source code | |
4010 | the enhanced partial redundancy elimination optimization can run away, | |
4011 | consuming all of the memory available on the host machine. This | |
4012 | parameter sets a limit on the length of the sets that are computed, | |
4013 | which prevents the runaway behavior. Setting a value of 0 for | |
4014 | this parameter allows an unlimited set length. | |
4015 | ||
4016 | .. gcc-param:: rpo-vn-max-loop-depth | |
4017 | ||
4018 | Maximum loop depth that is value-numbered optimistically. | |
4019 | When the limit hits the innermost | |
4020 | :samp:`{rpo-vn-max-loop-depth}` loops and the outermost loop in the | |
4021 | loop nest are value-numbered optimistically and the remaining ones not. | |
4022 | ||
4023 | .. gcc-param:: sccvn-max-alias-queries-per-access | |
4024 | ||
4025 | Maximum number of alias-oracle queries we perform when looking for | |
4026 | redundancies for loads and stores. If this limit is hit the search | |
4027 | is aborted and the load or store is not considered redundant. The | |
4028 | number of queries is algorithmically limited to the number of | |
4029 | stores on all paths from the load to the function entry. | |
4030 | ||
4031 | .. gcc-param:: ira-max-loops-num | |
4032 | ||
4033 | IRA uses regional register allocation by default. If a function | |
4034 | contains more loops than the number given by this parameter, only at most | |
4035 | the given number of the most frequently-executed loops form regions | |
4036 | for regional register allocation. | |
4037 | ||
4038 | .. gcc-param:: ira-max-conflict-table-size | |
4039 | ||
4040 | Although IRA uses a sophisticated algorithm to compress the conflict | |
4041 | table, the table can still require excessive amounts of memory for | |
4042 | huge functions. If the conflict table for a function could be more | |
4043 | than the size in MB given by this parameter, the register allocator | |
4044 | instead uses a faster, simpler, and lower-quality | |
4045 | algorithm that does not require building a pseudo-register conflict table. | |
4046 | ||
4047 | .. gcc-param:: ira-loop-reserved-regs | |
4048 | ||
4049 | IRA can be used to evaluate more accurate register pressure in loops | |
4050 | for decisions to move loop invariants (see :option:`-O3`). The number | |
4051 | of available registers reserved for some other purposes is given | |
4052 | by this parameter. Default of the parameter | |
4053 | is the best found from numerous experiments. | |
4054 | ||
4055 | .. gcc-param:: ira-consider-dup-in-all-alts | |
4056 | ||
4057 | Make IRA to consider matching constraint (duplicated operand number) | |
4058 | heavily in all available alternatives for preferred register class. | |
4059 | If it is set as zero, it means IRA only respects the matching | |
4060 | constraint when it's in the only available alternative with an | |
4061 | appropriate register class. Otherwise, it means IRA will check all | |
4062 | available alternatives for preferred register class even if it has | |
4063 | found some choice with an appropriate register class and respect the | |
4064 | found qualified matching constraint. | |
4065 | ||
4066 | .. gcc-param:: lra-inheritance-ebb-probability-cutoff | |
4067 | ||
4068 | LRA tries to reuse values reloaded in registers in subsequent insns. | |
4069 | This optimization is called inheritance. EBB is used as a region to | |
4070 | do this optimization. The parameter defines a minimal fall-through | |
4071 | edge probability in percentage used to add BB to inheritance EBB in | |
4072 | LRA. The default value was chosen | |
4073 | from numerous runs of SPEC2000 on x86-64. | |
4074 | ||
4075 | .. gcc-param:: loop-invariant-max-bbs-in-loop | |
4076 | ||
4077 | Loop invariant motion can be very expensive, both in compilation time and | |
4078 | in amount of needed compile-time memory, with very large loops. Loops | |
4079 | with more basic blocks than this parameter won't have loop invariant | |
4080 | motion optimization performed on them. | |
4081 | ||
4082 | .. gcc-param:: loop-max-datarefs-for-datadeps | |
4083 | ||
4084 | Building data dependencies is expensive for very large loops. This | |
4085 | parameter limits the number of data references in loops that are | |
4086 | considered for data dependence analysis. These large loops are no | |
4087 | handled by the optimizations using loop data dependencies. | |
4088 | ||
4089 | .. gcc-param:: max-vartrack-size | |
4090 | ||
4091 | Sets a maximum number of hash table slots to use during variable | |
4092 | tracking dataflow analysis of any function. If this limit is exceeded | |
4093 | with variable tracking at assignments enabled, analysis for that | |
4094 | function is retried without it, after removing all debug insns from | |
4095 | the function. If the limit is exceeded even without debug insns, var | |
4096 | tracking analysis is completely disabled for the function. Setting | |
4097 | the parameter to zero makes it unlimited. | |
4098 | ||
4099 | .. gcc-param:: max-vartrack-expr-depth | |
4100 | ||
4101 | Sets a maximum number of recursion levels when attempting to map | |
4102 | variable names or debug temporaries to value expressions. This trades | |
4103 | compilation time for more complete debug information. If this is set too | |
4104 | low, value expressions that are available and could be represented in | |
4105 | debug information may end up not being used; setting this higher may | |
4106 | enable the compiler to find more complex debug expressions, but compile | |
4107 | time and memory use may grow. | |
4108 | ||
4109 | .. gcc-param:: max-debug-marker-count | |
4110 | ||
4111 | Sets a threshold on the number of debug markers (e.g. begin stmt | |
4112 | markers) to avoid complexity explosion at inlining or expanding to RTL. | |
4113 | If a function has more such gimple stmts than the set limit, such stmts | |
4114 | will be dropped from the inlined copy of a function, and from its RTL | |
4115 | expansion. | |
4116 | ||
4117 | .. gcc-param:: min-nondebug-insn-uid | |
4118 | ||
4119 | Use uids starting at this parameter for nondebug insns. The range below | |
4120 | the parameter is reserved exclusively for debug insns created by | |
4121 | :option:`-fvar-tracking-assignments`, but debug insns may get | |
4122 | (non-overlapping) uids above it if the reserved range is exhausted. | |
4123 | ||
4124 | .. gcc-param:: ipa-sra-ptr-growth-factor | |
4125 | ||
4126 | IPA-SRA replaces a pointer to an aggregate with one or more new | |
4127 | parameters only when their cumulative size is less or equal to | |
4128 | ipa-sra-ptr-growth-factor times the size of the original | |
4129 | pointer parameter. | |
4130 | ||
4131 | .. gcc-param:: ipa-sra-max-replacements | |
4132 | ||
4133 | Maximum pieces of an aggregate that IPA-SRA tracks. As a | |
4134 | consequence, it is also the maximum number of replacements of a formal | |
4135 | parameter. | |
4136 | ||
4137 | .. gcc-param:: sra-max-scalarization-size-Ospeed | |
4138 | sra-max-scalarization-size-Osize | |
4139 | ||
4140 | The two Scalar Reduction of Aggregates passes (SRA and IPA-SRA) aim to | |
4141 | replace scalar parts of aggregates with uses of independent scalar | |
4142 | variables. These parameters control the maximum size, in storage units, | |
4143 | of aggregate which is considered for replacement when compiling for | |
4144 | speed | |
4145 | (:gcc-param:`sra-max-scalarization-size-Ospeed``) or size | |
4146 | (:gcc-param:`sra-max-scalarization-size-Osize``) respectively. | |
4147 | ||
4148 | .. gcc-param:: sra-max-propagations | |
4149 | ||
4150 | The maximum number of artificial accesses that Scalar Replacement of | |
4151 | Aggregates (SRA) will track, per one local variable, in order to | |
4152 | facilitate copy propagation. | |
4153 | ||
4154 | .. gcc-param:: tm-max-aggregate-size | |
4155 | ||
4156 | When making copies of thread-local variables in a transaction, this | |
4157 | parameter specifies the size in bytes after which variables are | |
4158 | saved with the logging functions as opposed to save/restore code | |
4159 | sequence pairs. This option only applies when using | |
4160 | :option:`-fgnu-tm`. | |
4161 | ||
4162 | .. gcc-param:: graphite-max-nb-scop-params | |
4163 | ||
4164 | To avoid exponential effects in the Graphite loop transforms, the | |
4165 | number of parameters in a Static Control Part (SCoP) is bounded. | |
4166 | A value of zero can be used to lift | |
4167 | the bound. A variable whose value is unknown at compilation time and | |
4168 | defined outside a SCoP is a parameter of the SCoP. | |
4169 | ||
4170 | .. gcc-param:: loop-block-tile-size | |
4171 | ||
4172 | Loop blocking or strip mining transforms, enabled with | |
4173 | :option:`-floop-block` or :option:`-floop-strip-mine`, strip mine each | |
4174 | loop in the loop nest by a given number of iterations. The strip | |
4175 | length can be changed using the loop-block-tile-size | |
4176 | parameter. | |
4177 | ||
4178 | .. gcc-param:: ipa-jump-function-lookups | |
4179 | ||
4180 | Specifies number of statements visited during jump function offset discovery. | |
4181 | ||
4182 | .. gcc-param:: ipa-cp-value-list-size | |
4183 | ||
4184 | IPA-CP attempts to track all possible values and types passed to a function's | |
4185 | parameter in order to propagate them and perform devirtualization. | |
4186 | :gcc-param:`ipa-cp-value-list-size` is the maximum number of values and types it | |
4187 | stores per one formal parameter of a function. | |
4188 | ||
4189 | .. gcc-param:: ipa-cp-eval-threshold | |
4190 | ||
4191 | IPA-CP calculates its own score of cloning profitability heuristics | |
4192 | and performs those cloning opportunities with scores that exceed | |
4193 | :gcc-param:`ipa-cp-eval-threshold`. | |
4194 | ||
4195 | .. gcc-param:: ipa-cp-max-recursive-depth | |
4196 | ||
4197 | Maximum depth of recursive cloning for self-recursive function. | |
4198 | ||
4199 | .. gcc-param:: ipa-cp-min-recursive-probability | |
4200 | ||
4201 | Recursive cloning only when the probability of call being executed exceeds | |
4202 | the parameter. | |
4203 | ||
4204 | .. gcc-param:: ipa-cp-profile-count-base | |
4205 | ||
4206 | When using :option:`-fprofile-use` option, IPA-CP will consider the measured | |
4207 | execution count of a call graph edge at this percentage position in their | |
4208 | histogram as the basis for its heuristics calculation. | |
4209 | ||
4210 | .. gcc-param:: ipa-cp-recursive-freq-factor | |
4211 | ||
4212 | The number of times interprocedural copy propagation expects recursive | |
4213 | functions to call themselves. | |
4214 | ||
4215 | .. gcc-param:: ipa-cp-recursion-penalty | |
4216 | ||
4217 | Percentage penalty the recursive functions will receive when they | |
4218 | are evaluated for cloning. | |
4219 | ||
4220 | .. gcc-param:: ipa-cp-single-call-penalty | |
4221 | ||
4222 | Percentage penalty functions containing a single call to another | |
4223 | function will receive when they are evaluated for cloning. | |
4224 | ||
4225 | .. gcc-param:: ipa-max-agg-items | |
4226 | ||
4227 | IPA-CP is also capable to propagate a number of scalar values passed | |
4228 | in an aggregate. :gcc-param:`ipa-max-agg-items`` controls the maximum | |
4229 | number of such values per one parameter. | |
4230 | ||
4231 | .. gcc-param:: ipa-cp-loop-hint-bonus | |
4232 | ||
4233 | When IPA-CP determines that a cloning candidate would make the number | |
4234 | of iterations of a loop known, it adds a bonus of | |
4235 | ipa-cp-loop-hint-bonus to the profitability score of | |
4236 | the candidate. | |
4237 | ||
4238 | .. gcc-param:: ipa-max-loop-predicates | |
4239 | ||
4240 | The maximum number of different predicates IPA will use to describe when | |
4241 | loops in a function have known properties. | |
4242 | ||
4243 | .. gcc-param:: ipa-max-aa-steps | |
4244 | ||
4245 | During its analysis of function bodies, IPA-CP employs alias analysis | |
4246 | in order to track values pointed to by function parameters. In order | |
4247 | not spend too much time analyzing huge functions, it gives up and | |
4248 | consider all memory clobbered after examining | |
4249 | :gcc-param:`ipa-max-aa-steps` statements modifying memory. | |
4250 | ||
4251 | .. gcc-param:: ipa-max-switch-predicate-bounds | |
4252 | ||
4253 | Maximal number of boundary endpoints of case ranges of switch statement. | |
4254 | For switch exceeding this limit, IPA-CP will not construct cloning cost | |
4255 | predicate, which is used to estimate cloning benefit, for default case | |
4256 | of the switch statement. | |
4257 | ||
4258 | .. gcc-param:: ipa-max-param-expr-ops | |
4259 | ||
4260 | IPA-CP will analyze conditional statement that references some function | |
4261 | parameter to estimate benefit for cloning upon certain constant value. | |
4262 | But if number of operations in a parameter expression exceeds | |
4263 | :gcc-param:`ipa-max-param-expr-ops`, the expression is treated as complicated | |
4264 | one, and is not handled by IPA analysis. | |
4265 | ||
4266 | .. gcc-param:: lto-partitions | |
4267 | ||
4268 | Specify desired number of partitions produced during WHOPR compilation. | |
4269 | The number of partitions should exceed the number of CPUs used for compilation. | |
4270 | ||
4271 | .. gcc-param:: lto-min-partition | |
4272 | ||
4273 | Size of minimal partition for WHOPR (in estimated instructions). | |
4274 | This prevents expenses of splitting very small programs into too many | |
4275 | partitions. | |
4276 | ||
4277 | .. gcc-param:: lto-max-partition | |
4278 | ||
4279 | Size of max partition for WHOPR (in estimated instructions). | |
4280 | to provide an upper bound for individual size of partition. | |
4281 | Meant to be used only with balanced partitioning. | |
4282 | ||
4283 | .. gcc-param:: lto-max-streaming-parallelism | |
4284 | ||
4285 | Maximal number of parallel processes used for LTO streaming. | |
4286 | ||
4287 | .. gcc-param:: cxx-max-namespaces-for-diagnostic-help | |
4288 | ||
4289 | The maximum number of namespaces to consult for suggestions when C++ | |
4290 | name lookup fails for an identifier. | |
4291 | ||
4292 | .. gcc-param:: sink-frequency-threshold | |
4293 | ||
4294 | The maximum relative execution frequency (in percents) of the target block | |
4295 | relative to a statement's original block to allow statement sinking of a | |
4296 | statement. Larger numbers result in more aggressive statement sinking. | |
4297 | A small positive adjustment is applied for | |
4298 | statements with memory operands as those are even more profitable so sink. | |
4299 | ||
4300 | .. gcc-param:: max-stores-to-sink | |
4301 | ||
4302 | The maximum number of conditional store pairs that can be sunk. Set to 0 | |
4303 | if either vectorization (:option:`-ftree-vectorize`) or if-conversion | |
4304 | (:option:`-ftree-loop-if-convert`) is disabled. | |
4305 | ||
4306 | .. gcc-param:: case-values-threshold | |
4307 | ||
4308 | The smallest number of different values for which it is best to use a | |
4309 | jump-table instead of a tree of conditional branches. If the value is | |
4310 | 0, use the default for the machine. | |
4311 | ||
4312 | .. gcc-param:: jump-table-max-growth-ratio-for-size | |
4313 | ||
4314 | The maximum code size growth ratio when expanding | |
4315 | into a jump table (in percent). The parameter is used when | |
4316 | optimizing for size. | |
4317 | ||
4318 | .. gcc-param:: jump-table-max-growth-ratio-for-speed | |
4319 | ||
4320 | The maximum code size growth ratio when expanding | |
4321 | into a jump table (in percent). The parameter is used when | |
4322 | optimizing for speed. | |
4323 | ||
4324 | .. gcc-param:: tree-reassoc-width | |
4325 | ||
4326 | Set the maximum number of instructions executed in parallel in | |
4327 | reassociated tree. This parameter overrides target dependent | |
4328 | heuristics used by default if has non zero value. | |
4329 | ||
4330 | .. gcc-param:: sched-pressure-algorithm | |
4331 | ||
4332 | Choose between the two available implementations of | |
4333 | :option:`-fsched-pressure`. Algorithm 1 is the original implementation | |
4334 | and is the more likely to prevent instructions from being reordered. | |
4335 | Algorithm 2 was designed to be a compromise between the relatively | |
4336 | conservative approach taken by algorithm 1 and the rather aggressive | |
4337 | approach taken by the default scheduler. It relies more heavily on | |
4338 | having a regular register file and accurate register pressure classes. | |
4339 | See :samp:`haifa-sched.cc` in the GCC sources for more details. | |
4340 | ||
4341 | The default choice depends on the target. | |
4342 | ||
4343 | .. gcc-param:: max-slsr-cand-scan | |
4344 | ||
4345 | Set the maximum number of existing candidates that are considered when | |
4346 | seeking a basis for a new straight-line strength reduction candidate. | |
4347 | ||
4348 | .. gcc-param:: asan-globals | |
4349 | ||
4350 | Enable buffer overflow detection for global objects. This kind | |
4351 | of protection is enabled by default if you are using | |
4352 | :option:`-fsanitize=address` option. | |
4353 | To disable global objects protection use :option:`--param` :gcc-param:`asan-globals`:samp:`=0`. | |
4354 | ||
4355 | .. gcc-param:: asan-stack | |
4356 | ||
4357 | Enable buffer overflow detection for stack objects. This kind of | |
4358 | protection is enabled by default when using :option:`-fsanitize=address`. | |
4359 | To disable stack protection use :option:`--param` :gcc-param:`asan-stack`:samp:`=0` option. | |
4360 | ||
4361 | .. gcc-param:: asan-instrument-reads | |
4362 | ||
4363 | Enable buffer overflow detection for memory reads. This kind of | |
4364 | protection is enabled by default when using :option:`-fsanitize=address`. | |
4365 | To disable memory reads protection use | |
4366 | :option:`--param` :gcc-param:`asan-instrument-reads`:samp:`=0`. | |
4367 | ||
4368 | .. gcc-param:: asan-instrument-writes | |
4369 | ||
4370 | Enable buffer overflow detection for memory writes. This kind of | |
4371 | protection is enabled by default when using :option:`-fsanitize=address`. | |
4372 | To disable memory writes protection use | |
4373 | :option:`--param` :gcc-param:`asan-instrument-writes`:samp:`=0` option. | |
4374 | ||
4375 | .. gcc-param:: asan-memintrin | |
4376 | ||
4377 | Enable detection for built-in functions. This kind of protection | |
4378 | is enabled by default when using :option:`-fsanitize=address`. | |
4379 | To disable built-in functions protection use | |
4380 | :option:`--param` :gcc-param:`asan-memintrin`:samp:`=0`. | |
4381 | ||
4382 | .. gcc-param:: asan-use-after-return | |
4383 | ||
4384 | Enable detection of use-after-return. This kind of protection | |
4385 | is enabled by default when using the :option:`-fsanitize=address` option. | |
4386 | To disable it use :option:`--param` :gcc-param:`asan-use-after-return`:samp:`=0`. | |
4387 | ||
4388 | .. note:: | |
4389 | ||
4390 | By default the check is disabled at run time. To enable it, | |
4391 | add ``detect_stack_use_after_return=1`` to the environment variable | |
4392 | :envvar:`ASAN_OPTIONS`. | |
4393 | ||
4394 | .. gcc-param:: asan-instrumentation-with-call-threshold | |
4395 | ||
4396 | If number of memory accesses in function being instrumented | |
4397 | is greater or equal to this number, use callbacks instead of inline checks. | |
4398 | E.g. to disable inline code use | |
4399 | :option:`--param` :gcc-param:`asan-instrumentation-with-call-threshold`:samp:`=0`. | |
4400 | ||
4401 | .. gcc-param:: hwasan-instrument-stack | |
4402 | ||
4403 | Enable hwasan instrumentation of statically sized stack-allocated variables. | |
4404 | This kind of instrumentation is enabled by default when using | |
4405 | :option:`-fsanitize=hwaddress` and disabled by default when using | |
4406 | :option:`-fsanitize=kernel-hwaddress`. | |
4407 | To disable stack instrumentation use | |
4408 | :option:`--param` :gcc-param:`hwasan-instrument-stack`:samp:`=0`, and to enable it use | |
4409 | :option:`--param` :gcc-param:`hwasan-instrument-stack`:samp:`=1`. | |
4410 | ||
4411 | .. gcc-param:: hwasan-random-frame-tag | |
4412 | ||
4413 | When using stack instrumentation, decide tags for stack variables using a | |
4414 | deterministic sequence beginning at a random tag for each frame. With this | |
4415 | parameter unset tags are chosen using the same sequence but beginning from 1. | |
4416 | This is enabled by default for :option:`-fsanitize=hwaddress` and unavailable | |
4417 | for :option:`-fsanitize=kernel-hwaddress`. | |
4418 | To disable it use :option:`--param` :gcc-param:`hwasan-random-frame-tag`:samp:`=0`. | |
4419 | ||
4420 | .. gcc-param:: hwasan-instrument-allocas | |
4421 | ||
4422 | Enable hwasan instrumentation of dynamically sized stack-allocated variables. | |
4423 | This kind of instrumentation is enabled by default when using | |
4424 | :option:`-fsanitize=hwaddress` and disabled by default when using | |
4425 | :option:`-fsanitize=kernel-hwaddress`. | |
4426 | To disable instrumentation of such variables use | |
4427 | :option:`--param` :gcc-param:`hwasan-instrument-allocas`:samp:`=0`, and to enable it use | |
4428 | :option:`--param` :gcc-param:`hwasan-instrument-allocas`:samp:`=1`. | |
4429 | ||
4430 | .. gcc-param:: hwasan-instrument-reads | |
4431 | ||
4432 | Enable hwasan checks on memory reads. Instrumentation of reads is enabled by | |
4433 | default for both :option:`-fsanitize=hwaddress` and | |
4434 | :option:`-fsanitize=kernel-hwaddress`. | |
4435 | To disable checking memory reads use | |
4436 | :option:`--param` :gcc-param:`hwasan-instrument-reads`:samp:`=0`. | |
4437 | ||
4438 | .. gcc-param:: hwasan-instrument-writes | |
4439 | ||
4440 | Enable hwasan checks on memory writes. Instrumentation of writes is enabled by | |
4441 | default for both :option:`-fsanitize=hwaddress` and | |
4442 | :option:`-fsanitize=kernel-hwaddress`. | |
4443 | To disable checking memory writes use | |
4444 | :option:`--param` :gcc-param:`hwasan-instrument-writes`:samp:`=0`. | |
4445 | ||
4446 | .. gcc-param:: hwasan-instrument-mem-intrinsics | |
4447 | ||
4448 | Enable hwasan instrumentation of builtin functions. Instrumentation of these | |
4449 | builtin functions is enabled by default for both :option:`-fsanitize=hwaddress` | |
4450 | and :option:`-fsanitize=kernel-hwaddress`. | |
4451 | To disable instrumentation of builtin functions use | |
4452 | :option:`--param` :gcc-param:`hwasan-instrument-mem-intrinsics`:samp:`=0`. | |
4453 | ||
4454 | .. gcc-param:: use-after-scope-direct-emission-threshold | |
4455 | ||
4456 | If the size of a local variable in bytes is smaller or equal to this | |
4457 | number, directly poison (or unpoison) shadow memory instead of using | |
4458 | run-time callbacks. | |
4459 | ||
4460 | .. gcc-param:: tsan-distinguish-volatile | |
4461 | ||
4462 | Emit special instrumentation for accesses to volatiles. | |
4463 | ||
4464 | .. gcc-param:: tsan-instrument-func-entry-exit | |
4465 | ||
4466 | Emit instrumentation calls to :samp:`__tsan_func_entry()` and :samp:`__tsan_func_exit()`. | |
4467 | ||
4468 | .. gcc-param:: max-fsm-thread-path-insns | |
4469 | ||
4470 | Maximum number of instructions to copy when duplicating blocks on a | |
4471 | finite state automaton jump thread path. | |
4472 | ||
4473 | .. gcc-param:: threader-debug | |
4474 | ||
4475 | threader-debug=[none|all] | |
4476 | Enables verbose dumping of the threader solver. | |
4477 | ||
4478 | .. gcc-param:: parloops-chunk-size | |
4479 | ||
4480 | Chunk size of omp schedule for loops parallelized by parloops. | |
4481 | ||
4482 | .. gcc-param:: parloops-schedule | |
4483 | ||
4484 | Schedule type of omp schedule for loops parallelized by parloops (static, | |
4485 | dynamic, guided, auto, runtime). | |
4486 | ||
4487 | .. gcc-param:: parloops-min-per-thread | |
4488 | ||
4489 | The minimum number of iterations per thread of an innermost parallelized | |
4490 | loop for which the parallelized variant is preferred over the single threaded | |
4491 | one. Note that for a parallelized loop nest the | |
4492 | minimum number of iterations of the outermost loop per thread is two. | |
4493 | ||
4494 | .. gcc-param:: max-ssa-name-query-depth | |
4495 | ||
4496 | Maximum depth of recursion when querying properties of SSA names in things | |
4497 | like fold routines. One level of recursion corresponds to following a | |
4498 | use-def chain. | |
4499 | ||
4500 | .. gcc-param:: max-speculative-devirt-maydefs | |
4501 | ||
4502 | The maximum number of may-defs we analyze when looking for a must-def | |
4503 | specifying the dynamic type of an object that invokes a virtual call | |
4504 | we may be able to devirtualize speculatively. | |
4505 | ||
4506 | .. gcc-param:: max-vrp-switch-assertions | |
4507 | ||
4508 | The maximum number of assertions to add along the default edge of a switch | |
4509 | statement during VRP. | |
4510 | ||
4511 | .. gcc-param:: evrp-sparse-threshold | |
4512 | ||
4513 | Maximum number of basic blocks before EVRP uses a sparse cache. | |
4514 | ||
4515 | .. gcc-param:: vrp1-mode | |
4516 | ||
4517 | Specifies the mode VRP pass 1 should operate in. | |
4518 | ||
4519 | .. gcc-param:: vrp2-mode | |
4520 | ||
4521 | Specifies the mode VRP pass 2 should operate in. | |
4522 | ||
4523 | .. gcc-param:: ranger-debug | |
4524 | ||
4525 | Specifies the type of debug output to be issued for ranges. | |
4526 | ||
4527 | .. gcc-param:: evrp-switch-limit | |
4528 | ||
4529 | Specifies the maximum number of switch cases before EVRP ignores a switch. | |
4530 | ||
4531 | .. gcc-param:: unroll-jam-min-percent | |
4532 | ||
4533 | The minimum percentage of memory references that must be optimized | |
4534 | away for the unroll-and-jam transformation to be considered profitable. | |
4535 | ||
4536 | .. gcc-param:: unroll-jam-max-unroll | |
4537 | ||
4538 | The maximum number of times the outer loop should be unrolled by | |
4539 | the unroll-and-jam transformation. | |
4540 | ||
4541 | .. gcc-param:: max-rtl-if-conversion-unpredictable-cost | |
4542 | ||
4543 | Maximum permissible cost for the sequence that would be generated | |
4544 | by the RTL if-conversion pass for a branch that is considered unpredictable. | |
4545 | ||
4546 | .. gcc-param:: max-variable-expansions-in-unroller | |
4547 | ||
4548 | If :option:`-fvariable-expansion-in-unroller` is used, the maximum number | |
4549 | of times that an individual variable will be expanded during loop unrolling. | |
4550 | ||
4551 | .. gcc-param:: partial-inlining-entry-probability | |
4552 | ||
4553 | Maximum probability of the entry BB of split region | |
4554 | (in percent relative to entry BB of the function) | |
4555 | to make partial inlining happen. | |
4556 | ||
4557 | .. gcc-param:: max-tracked-strlens | |
4558 | ||
4559 | Maximum number of strings for which strlen optimization pass will | |
4560 | track string lengths. | |
4561 | ||
4562 | .. gcc-param:: gcse-after-reload-partial-fraction | |
4563 | ||
4564 | The threshold ratio for performing partial redundancy | |
4565 | elimination after reload. | |
4566 | ||
4567 | .. gcc-param:: gcse-after-reload-critical-fraction | |
4568 | ||
4569 | The threshold ratio of critical edges execution count that | |
4570 | permit performing redundancy elimination after reload. | |
4571 | ||
4572 | .. gcc-param:: max-loop-header-insns | |
4573 | ||
4574 | The maximum number of insns in loop header duplicated | |
4575 | by the copy loop headers pass. | |
4576 | ||
4577 | .. gcc-param:: vect-epilogues-nomask | |
4578 | ||
4579 | Enable loop epilogue vectorization using smaller vector size. | |
4580 | ||
4581 | .. gcc-param:: vect-partial-vector-usage | |
4582 | ||
4583 | Controls when the loop vectorizer considers using partial vector loads | |
4584 | and stores as an alternative to falling back to scalar code. 0 stops | |
4585 | the vectorizer from ever using partial vector loads and stores. 1 allows | |
4586 | partial vector loads and stores if vectorization removes the need for the | |
4587 | code to iterate. 2 allows partial vector loads and stores in all loops. | |
4588 | The parameter only has an effect on targets that support partial | |
4589 | vector loads and stores. | |
4590 | ||
4591 | .. gcc-param:: vect-inner-loop-cost-factor | |
4592 | ||
4593 | The maximum factor which the loop vectorizer applies to the cost of statements | |
4594 | in an inner loop relative to the loop being vectorized. The factor applied | |
4595 | is the maximum of the estimated number of iterations of the inner loop and | |
4596 | this parameter. The default value of this parameter is 50. | |
4597 | ||
4598 | .. gcc-param:: vect-induction-float | |
4599 | ||
4600 | Enable loop vectorization of floating point inductions. | |
4601 | ||
4602 | .. gcc-param:: avoid-fma-max-bits | |
4603 | ||
4604 | Maximum number of bits for which we avoid creating FMAs. | |
4605 | ||
4606 | .. gcc-param:: sms-loop-average-count-threshold | |
4607 | ||
4608 | A threshold on the average loop count considered by the swing modulo scheduler. | |
4609 | ||
4610 | .. gcc-param:: sms-dfa-history | |
4611 | ||
4612 | The number of cycles the swing modulo scheduler considers when checking | |
4613 | conflicts using DFA. | |
4614 | ||
4615 | .. gcc-param:: graphite-allow-codegen-errors | |
4616 | ||
4617 | Whether codegen errors should be ICEs when :option:`-fchecking`. | |
4618 | ||
4619 | .. gcc-param:: sms-max-ii-factor | |
4620 | ||
4621 | A factor for tuning the upper bound that swing modulo scheduler | |
4622 | uses for scheduling a loop. | |
4623 | ||
4624 | .. gcc-param:: lra-max-considered-reload-pseudos | |
4625 | ||
4626 | The max number of reload pseudos which are considered during | |
4627 | spilling a non-reload pseudo. | |
4628 | ||
4629 | .. gcc-param:: max-pow-sqrt-depth | |
4630 | ||
4631 | Maximum depth of sqrt chains to use when synthesizing exponentiation | |
4632 | by a real constant. | |
4633 | ||
4634 | .. gcc-param:: max-dse-active-local-stores | |
4635 | ||
4636 | Maximum number of active local stores in RTL dead store elimination. | |
4637 | ||
4638 | .. gcc-param:: asan-instrument-allocas | |
4639 | ||
4640 | Enable asan allocas/VLAs protection. | |
4641 | ||
4642 | .. gcc-param:: max-iterations-computation-cost | |
4643 | ||
4644 | Bound on the cost of an expression to compute the number of iterations. | |
4645 | ||
4646 | .. gcc-param:: max-isl-operations | |
4647 | ||
4648 | Maximum number of isl operations, 0 means unlimited. | |
4649 | ||
4650 | .. gcc-param:: graphite-max-arrays-per-scop | |
4651 | ||
4652 | Maximum number of arrays per scop. | |
4653 | ||
4654 | .. gcc-param:: max-vartrack-reverse-op-size | |
4655 | ||
4656 | Max. size of loc list for which reverse ops should be added. | |
4657 | ||
4658 | .. gcc-param:: fsm-scale-path-stmts | |
4659 | ||
4660 | Scale factor to apply to the number of statements in a threading path | |
4661 | when comparing to the number of (scaled) blocks. | |
4662 | ||
4663 | .. gcc-param:: uninit-control-dep-attempts | |
4664 | ||
4665 | Maximum number of nested calls to search for control dependencies | |
4666 | during uninitialized variable analysis. | |
4667 | ||
4668 | .. gcc-param:: fsm-scale-path-blocks | |
4669 | ||
4670 | Scale factor to apply to the number of blocks in a threading path | |
4671 | when comparing to the number of (scaled) statements. | |
4672 | ||
4673 | .. gcc-param:: sched-autopref-queue-depth | |
4674 | ||
4675 | Hardware autoprefetcher scheduler model control flag. | |
4676 | Number of lookahead cycles the model looks into; at ' | |
4677 | ' only enable instruction sorting heuristic. | |
4678 | ||
4679 | .. gcc-param:: loop-versioning-max-inner-insns | |
4680 | ||
4681 | The maximum number of instructions that an inner loop can have | |
4682 | before the loop versioning pass considers it too big to copy. | |
4683 | ||
4684 | .. gcc-param:: loop-versioning-max-outer-insns | |
4685 | ||
4686 | The maximum number of instructions that an outer loop can have | |
4687 | before the loop versioning pass considers it too big to copy, | |
4688 | discounting any instructions in inner loops that directly benefit | |
4689 | from versioning. | |
4690 | ||
4691 | .. gcc-param:: ssa-name-def-chain-limit | |
4692 | ||
4693 | The maximum number of SSA_NAME assignments to follow in determining | |
4694 | a property of a variable such as its value. This limits the number | |
4695 | of iterations or recursive calls GCC performs when optimizing certain | |
4696 | statements or when determining their validity prior to issuing | |
4697 | diagnostics. | |
4698 | ||
4699 | .. gcc-param:: store-merging-max-size | |
4700 | ||
4701 | Maximum size of a single store merging region in bytes. | |
4702 | ||
4703 | .. gcc-param:: hash-table-verification-limit | |
4704 | ||
4705 | The number of elements for which hash table verification is done | |
4706 | for each searched element. | |
4707 | ||
4708 | .. gcc-param:: max-find-base-term-values | |
4709 | ||
4710 | Maximum number of VALUEs handled during a single find_base_term call. | |
4711 | ||
4712 | .. gcc-param:: analyzer-max-enodes-per-program-point | |
4713 | ||
4714 | The maximum number of exploded nodes per program point within | |
4715 | the analyzer, before terminating analysis of that point. | |
4716 | ||
4717 | .. gcc-param:: analyzer-max-constraints | |
4718 | ||
4719 | The maximum number of constraints per state. | |
4720 | ||
4721 | .. gcc-param:: analyzer-min-snodes-for-call-summary | |
4722 | ||
4723 | The minimum number of supernodes within a function for the | |
4724 | analyzer to consider summarizing its effects at call sites. | |
4725 | ||
4726 | .. gcc-param:: analyzer-max-enodes-for-full-dump | |
4727 | ||
4728 | The maximum depth of exploded nodes that should appear in a dot dump | |
4729 | before switching to a less verbose format. | |
4730 | ||
4731 | .. gcc-param:: analyzer-max-recursion-depth | |
4732 | ||
4733 | The maximum number of times a callsite can appear in a call stack | |
4734 | within the analyzer, before terminating analysis of a call that would | |
4735 | recurse deeper. | |
4736 | ||
4737 | .. gcc-param:: analyzer-max-svalue-depth | |
4738 | ||
4739 | The maximum depth of a symbolic value, before approximating | |
4740 | the value as unknown. | |
4741 | ||
4742 | .. gcc-param:: analyzer-max-infeasible-edges | |
4743 | ||
4744 | The maximum number of infeasible edges to reject before declaring | |
4745 | a diagnostic as infeasible. | |
4746 | ||
4747 | .. gcc-param:: gimple-fe-computed-hot-bb-threshold | |
4748 | ||
4749 | The number of executions of a basic block which is considered hot. | |
4750 | The parameter is used only in GIMPLE FE. | |
4751 | ||
4752 | .. gcc-param:: analyzer-bb-explosion-factor | |
4753 | ||
4754 | The maximum number of 'after supernode' exploded nodes within the analyzer | |
4755 | per supernode, before terminating analysis. | |
4756 | ||
4757 | .. gcc-param:: ranger-logical-depth | |
4758 | ||
4759 | Maximum depth of logical expression evaluation ranger will look through | |
4760 | when evaluating outgoing edge ranges. | |
4761 | ||
4762 | .. gcc-param:: relation-block-limit | |
4763 | ||
4764 | Maximum number of relations the oracle will register in a basic block. | |
4765 | ||
4766 | .. gcc-param:: min-pagesize | |
4767 | ||
4768 | Minimum page size for warning purposes. | |
4769 | ||
4770 | .. gcc-param:: openacc-kernels | |
4771 | ||
4772 | Specify mode of OpenACC 'kernels' constructs handling. | |
4773 | With :option:`--param` :gcc-param:`openacc-kernels`:samp:`=decompose`, OpenACC 'kernels' | |
4774 | constructs are decomposed into parts, a sequence of compute | |
4775 | constructs, each then handled individually. | |
4776 | This is work in progress. | |
4777 | With :option:`--param` :gcc-param:`openacc-kernels`:samp:`=parloops`, OpenACC 'kernels' | |
4778 | constructs are handled by the :samp:`parloops` pass, en bloc. | |
4779 | This is the current default. | |
4780 | ||
4781 | .. gcc-param:: openacc-privatization | |
4782 | ||
4783 | Specify mode of OpenACC privatization diagnostics for | |
4784 | :option:`-fopt-info-omp-note` and applicable | |
4785 | :option:`-fdump-tree-*-details`. | |
4786 | With :option:`--param` :gcc-param:`openacc-privatization`:samp:`=quiet`, don't diagnose. | |
4787 | This is the current default. | |
4788 | With :option:`--param` :gcc-param:`openacc-privatization`:samp:`=noisy`, do diagnose. | |
4789 | ||
4790 | The following choices of :samp:`{name}` are available on AArch64 targets: | |
4791 | ||
4792 | .. gcc-param:: aarch64-sve-compare-costs | |
4793 | ||
4794 | When vectorizing for SVE, consider using 'unpacked' vectors for | |
4795 | smaller elements and use the cost model to pick the cheapest approach. | |
4796 | Also use the cost model to choose between SVE and Advanced SIMD vectorization. | |
4797 | ||
4798 | Using unpacked vectors includes storing smaller elements in larger | |
4799 | containers and accessing elements with extending loads and truncating | |
4800 | stores. | |
4801 | ||
4802 | .. gcc-param:: aarch64-float-recp-precision | |
4803 | ||
4804 | The number of Newton iterations for calculating the reciprocal for float type. | |
4805 | The precision of division is proportional to this param when division | |
4806 | approximation is enabled. The default value is 1. | |
4807 | ||
4808 | .. gcc-param:: aarch64-double-recp-precision | |
4809 | ||
4810 | The number of Newton iterations for calculating the reciprocal for double type. | |
4811 | The precision of division is propotional to this param when division | |
4812 | approximation is enabled. The default value is 2. | |
4813 | ||
4814 | .. gcc-param:: aarch64-autovec-preference | |
4815 | ||
4816 | Force an ISA selection strategy for auto-vectorization. Accepts values from | |
4817 | 0 to 4, inclusive. | |
4818 | ||
4819 | :samp:`0` | |
4820 | Use the default heuristics. | |
4821 | ||
4822 | :samp:`1` | |
4823 | Use only Advanced SIMD for auto-vectorization. | |
4824 | ||
4825 | :samp:`2` | |
4826 | Use only SVE for auto-vectorization. | |
4827 | ||
4828 | :samp:`3` | |
4829 | Use both Advanced SIMD and SVE. Prefer Advanced SIMD when the costs are | |
4830 | deemed equal. | |
4831 | ||
4832 | :samp:`4` | |
4833 | Use both Advanced SIMD and SVE. Prefer SVE when the costs are deemed equal. | |
4834 | ||
4835 | The default value is 0. | |
4836 | ||
4837 | .. gcc-param:: aarch64-loop-vect-issue-rate-niters | |
4838 | ||
4839 | The tuning for some AArch64 CPUs tries to take both latencies and issue | |
4840 | rates into account when deciding whether a loop should be vectorized | |
4841 | using SVE, vectorized using Advanced SIMD, or not vectorized at all. | |
4842 | If this parameter is set to :samp:`{n}`, GCC will not use this heuristic | |
4843 | for loops that are known to execute in fewer than :samp:`{n}` Advanced | |
4844 | SIMD iterations. | |
4845 | ||
4846 | .. gcc-param:: aarch64-vect-unroll-limit | |
4847 | ||
4848 | The vectorizer will use available tuning information to determine whether it | |
4849 | would be beneficial to unroll the main vectorized loop and by how much. This | |
4850 | parameter set's the upper bound of how much the vectorizer will unroll the main | |
4851 | loop. The default value is four. | |
4852 | ||
4853 | The following choices of :samp:`{name}` are available on i386 and x86_64 targets: | |
4854 | ||
4855 | .. gcc-param:: x86-stlf-window-ninsns | |
4856 | ||
3ed1b4ce | 4857 | Instructions number above which STFL stall penalty can be compensated. |