]>
Commit | Line | Data |
---|---|---|
99dee823 | 1 | @c Copyright (C) 2010-2021 Free Software Foundation, Inc. |
3abe8cab JH |
2 | @c This is part of the GCC manual. |
3 | @c For copying conditions, see the file gcc.texi. | |
4 | @c Contributed by Jan Hubicka <jh@suse.cz> and | |
5 | @c Diego Novillo <dnovillo@google.com> | |
6 | ||
7 | @node LTO | |
8 | @chapter Link Time Optimization | |
9 | @cindex lto | |
10 | @cindex whopr | |
11 | @cindex wpa | |
12 | @cindex ltrans | |
13 | ||
548e68fc JL |
14 | Link Time Optimization (LTO) gives GCC the capability of |
15 | dumping its internal representation (GIMPLE) to disk, | |
16 | so that all the different compilation units that make up | |
17 | a single executable can be optimized as a single module. | |
18 | This expands the scope of inter-procedural optimizations | |
19 | to encompass the whole program (or, rather, everything | |
20 | that is visible at link time). | |
21 | ||
22 | @menu | |
23 | * LTO Overview:: Overview of LTO. | |
24 | * LTO object file layout:: LTO file sections in ELF. | |
25 | * IPA:: Using summary information in IPA passes. | |
26 | * WHOPR:: Whole program assumptions, | |
27 | linker plugin and symbol visibilities. | |
28 | * Internal flags:: Internal flags controlling @code{lto1}. | |
29 | @end menu | |
30 | ||
31 | @node LTO Overview | |
3abe8cab JH |
32 | @section Design Overview |
33 | ||
34 | Link time optimization is implemented as a GCC front end for a | |
35 | bytecode representation of GIMPLE that is emitted in special sections | |
36 | of @code{.o} files. Currently, LTO support is enabled in most | |
37 | ELF-based systems, as well as darwin, cygwin and mingw systems. | |
38 | ||
39 | Since GIMPLE bytecode is saved alongside final object code, object | |
40 | files generated with LTO support are larger than regular object files. | |
41 | This ``fat'' object format makes it easy to integrate LTO into | |
42 | existing build systems, as one can, for instance, produce archives of | |
43 | the files. Additionally, one might be able to ship one set of fat | |
44 | objects which could be used both for development and the production of | |
45 | optimized builds. A, perhaps surprising, side effect of this feature | |
0f605f72 | 46 | is that any mistake in the toolchain leads to LTO information not |
535b7874 | 47 | being used (e.g.@: an older @code{libtool} calling @code{ld} directly). |
3abe8cab JH |
48 | This is both an advantage, as the system is more robust, and a |
49 | disadvantage, as the user is not informed that the optimization has | |
50 | been disabled. | |
51 | ||
52 | The current implementation only produces ``fat'' objects, effectively | |
53 | doubling compilation time and increasing file sizes up to 5x the | |
54 | original size. This hides the problem that some tools, such as | |
55 | @code{ar} and @code{nm}, need to understand symbol tables of LTO | |
56 | sections. These tools were extended to use the plugin infrastructure, | |
57 | and with these problems solved, GCC will also support ``slim'' objects | |
58 | consisting of the intermediate code alone. | |
59 | ||
60 | At the highest level, LTO splits the compiler in two. The first half | |
61 | (the ``writer'') produces a streaming representation of all the | |
62 | internal data structures needed to optimize and generate code. This | |
63 | includes declarations, types, the callgraph and the GIMPLE representation | |
64 | of function bodies. | |
65 | ||
66 | When @option{-flto} is given during compilation of a source file, the | |
67 | pass manager executes all the passes in @code{all_lto_gen_passes}. | |
68 | Currently, this phase is composed of two IPA passes: | |
69 | ||
70 | @itemize @bullet | |
71 | @item @code{pass_ipa_lto_gimple_out} | |
72 | This pass executes the function @code{lto_output} in | |
73 | @file{lto-streamer-out.c}, which traverses the call graph encoding | |
535b7874 | 74 | every reachable declaration, type and function. This generates a |
3abe8cab JH |
75 | memory representation of all the file sections described below. |
76 | ||
77 | @item @code{pass_ipa_lto_finish_out} | |
78 | This pass executes the function @code{produce_asm_for_decls} in | |
79 | @file{lto-streamer-out.c}, which takes the memory image built in the | |
80 | previous pass and encodes it in the corresponding ELF file sections. | |
81 | @end itemize | |
82 | ||
83 | The second half of LTO support is the ``reader''. This is implemented | |
84 | as the GCC front end @file{lto1} in @file{lto/lto.c}. When | |
85 | @file{collect2} detects a link set of @code{.o}/@code{.a} files with | |
86 | LTO information and the @option{-flto} is enabled, it invokes | |
87 | @file{lto1} which reads the set of files and aggregates them into a | |
88 | single translation unit for optimization. The main entry point for | |
89 | the reader is @file{lto/lto.c}:@code{lto_main}. | |
90 | ||
91 | @subsection LTO modes of operation | |
92 | ||
93 | One of the main goals of the GCC link-time infrastructure was to allow | |
94 | effective compilation of large programs. For this reason GCC implements two | |
95 | link-time compilation modes. | |
96 | ||
97 | @enumerate | |
98 | @item @emph{LTO mode}, in which the whole program is read into the | |
99 | compiler at link-time and optimized in a similar way as if it | |
100 | were a single source-level compilation unit. | |
101 | ||
102 | @item @emph{WHOPR or partitioned mode}, designed to utilize multiple | |
103 | CPUs and/or a distributed compilation environment to quickly link | |
104 | large applications. WHOPR stands for WHOle Program optimizeR (not to | |
105 | be confused with the semantics of @option{-fwhole-program}). It | |
106 | partitions the aggregated callgraph from many different @code{.o} | |
107 | files and distributes the compilation of the sub-graphs to different | |
108 | CPUs. | |
109 | ||
110 | Note that distributed compilation is not implemented yet, but since | |
111 | the parallelism is facilitated via generating a @code{Makefile}, it | |
112 | would be easy to implement. | |
113 | @end enumerate | |
114 | ||
115 | WHOPR splits LTO into three main stages: | |
116 | @enumerate | |
117 | @item Local generation (LGEN) | |
535b7874 | 118 | This stage executes in parallel. Every file in the program is compiled |
3abe8cab JH |
119 | into the intermediate language and packaged together with the local |
120 | call-graph and summary information. This stage is the same for both | |
121 | the LTO and WHOPR compilation mode. | |
122 | ||
123 | @item Whole Program Analysis (WPA) | |
535b7874 RW |
124 | WPA is performed sequentially. The global call-graph is generated, and |
125 | a global analysis procedure makes transformation decisions. The global | |
3abe8cab | 126 | call-graph is partitioned to facilitate parallel optimization during |
535b7874 | 127 | phase 3. The results of the WPA stage are stored into new object files |
3abe8cab JH |
128 | which contain the partitions of program expressed in the intermediate |
129 | language and the optimization decisions. | |
130 | ||
131 | @item Local transformations (LTRANS) | |
535b7874 | 132 | This stage executes in parallel. All the decisions made during phase 2 |
3abe8cab | 133 | are implemented locally in each partitioned object file, and the final |
535b7874 | 134 | object code is generated. Optimizations which cannot be decided |
3abe8cab JH |
135 | efficiently during the phase 2 may be performed on the local |
136 | call-graph partitions. | |
137 | @end enumerate | |
138 | ||
139 | WHOPR can be seen as an extension of the usual LTO mode of | |
535b7874 | 140 | compilation. In LTO, WPA and LTRANS are executed within a single |
3abe8cab JH |
141 | execution of the compiler, after the whole program has been read into |
142 | memory. | |
143 | ||
535b7874 | 144 | When compiling in WHOPR mode, the callgraph is partitioned during |
3abe8cab JH |
145 | the WPA stage. The whole program is split into a given number of |
146 | partitions of roughly the same size. The compiler tries to | |
147 | minimize the number of references which cross partition boundaries. | |
148 | The main advantage of WHOPR is to allow the parallel execution of | |
149 | LTRANS stages, which are the most time-consuming part of the | |
150 | compilation process. Additionally, it avoids the need to load the | |
151 | whole program into memory. | |
152 | ||
153 | ||
548e68fc | 154 | @node LTO object file layout |
3abe8cab JH |
155 | @section LTO file sections |
156 | ||
157 | LTO information is stored in several ELF sections inside object files. | |
158 | Data structures and enum codes for sections are defined in | |
159 | @file{lto-streamer.h}. | |
160 | ||
161 | These sections are emitted from @file{lto-streamer-out.c} and mapped | |
162 | in all at once from @file{lto/lto.c}:@code{lto_file_read}. The | |
163 | individual functions dealing with the reading/writing of each section | |
164 | are described below. | |
165 | ||
166 | @itemize @bullet | |
167 | @item Command line options (@code{.gnu.lto_.opts}) | |
168 | ||
169 | This section contains the command line options used to generate the | |
535b7874 | 170 | object files. This is used at link time to determine the optimization |
3abe8cab JH |
171 | level and other settings when they are not explicitly specified at the |
172 | linker command line. | |
173 | ||
174 | Currently, GCC does not support combining LTO object files compiled | |
175 | with different set of the command line options into a single binary. | |
535b7874 | 176 | At link time, the options given on the command line and the options |
3abe8cab JH |
177 | saved on all the files in a link-time set are applied globally. No |
178 | attempt is made at validating the combination of flags (other than the | |
179 | usual validation done by option processing). This is implemented in | |
180 | @file{lto/lto.c}:@code{lto_read_all_file_options}. | |
181 | ||
182 | ||
183 | @item Symbol table (@code{.gnu.lto_.symtab}) | |
184 | ||
185 | This table replaces the ELF symbol table for functions and variables | |
535b7874 | 186 | represented in the LTO IL. Symbols used and exported by the optimized |
3abe8cab JH |
187 | assembly code of ``fat'' objects might not match the ones used and |
188 | exported by the intermediate code. This table is necessary because | |
189 | the intermediate code is less optimized and thus requires a separate | |
190 | symbol table. | |
191 | ||
192 | Additionally, the binary code in the ``fat'' object will lack a call | |
193 | to a function, since the call was optimized out at compilation time | |
194 | after the intermediate language was streamed out. In some special | |
535b7874 | 195 | cases, the same optimization may not happen during link-time |
3abe8cab JH |
196 | optimization. This would lead to an undefined symbol if only one |
197 | symbol table was used. | |
198 | ||
199 | The symbol table is emitted in | |
200 | @file{lto-streamer-out.c}:@code{produce_symtab}. | |
201 | ||
202 | ||
203 | @item Global declarations and types (@code{.gnu.lto_.decls}) | |
204 | ||
205 | This section contains an intermediate language dump of all | |
206 | declarations and types required to represent the callgraph, static | |
207 | variables and top-level debug info. | |
208 | ||
209 | The contents of this section are emitted in | |
210 | @file{lto-streamer-out.c}:@code{produce_asm_for_decls}. Types and | |
211 | symbols are emitted in a topological order that preserves the sharing | |
212 | of pointers when the file is read back in | |
213 | (@file{lto.c}:@code{read_cgraph_and_symbols}). | |
214 | ||
215 | ||
216 | @item The callgraph (@code{.gnu.lto_.cgraph}) | |
217 | ||
218 | This section contains the basic data structure used by the GCC | |
535b7874 | 219 | inter-procedural optimization infrastructure. This section stores an |
3abe8cab JH |
220 | annotated multi-graph which represents the functions and call sites as |
221 | well as the variables, aliases and top-level @code{asm} statements. | |
222 | ||
223 | This section is emitted in | |
224 | @file{lto-streamer-out.c}:@code{output_cgraph} and read in | |
225 | @file{lto-cgraph.c}:@code{input_cgraph}. | |
226 | ||
227 | ||
228 | @item IPA references (@code{.gnu.lto_.refs}) | |
229 | ||
230 | This section contains references between function and static | |
231 | variables. It is emitted by @file{lto-cgraph.c}:@code{output_refs} | |
232 | and read by @file{lto-cgraph.c}:@code{input_refs}. | |
233 | ||
234 | ||
235 | @item Function bodies (@code{.gnu.lto_.function_body.<name>}) | |
236 | ||
237 | This section contains function bodies in the intermediate language | |
535b7874 | 238 | representation. Every function body is in a separate section to allow |
3abe8cab JH |
239 | copying of the section independently to different object files or |
240 | reading the function on demand. | |
241 | ||
242 | Functions are emitted in | |
243 | @file{lto-streamer-out.c}:@code{output_function} and read in | |
244 | @file{lto-streamer-in.c}:@code{input_function}. | |
245 | ||
246 | ||
247 | @item Static variable initializers (@code{.gnu.lto_.vars}) | |
248 | ||
249 | This section contains all the symbols in the global variable pool. It | |
250 | is emitted by @file{lto-cgraph.c}:@code{output_varpool} and read in | |
251 | @file{lto-cgraph.c}:@code{input_cgraph}. | |
252 | ||
253 | @item Summaries and optimization summaries used by IPA passes | |
254 | (@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs}, | |
255 | @code{pureconst} or @code{reference}) | |
256 | ||
257 | These sections are used by IPA passes that need to emit summary | |
258 | information during LTO generation to be read and aggregated at | |
259 | link time. Each pass is responsible for implementing two pass manager | |
260 | hooks: one for writing the summary and another for reading it in. The | |
261 | format of these sections is entirely up to each individual pass. The | |
262 | only requirement is that the writer and reader hooks agree on the | |
263 | format. | |
264 | @end itemize | |
265 | ||
266 | ||
548e68fc | 267 | @node IPA |
3abe8cab JH |
268 | @section Using summary information in IPA passes |
269 | ||
270 | Programs are represented internally as a @emph{callgraph} (a | |
271 | multi-graph where nodes are functions and edges are call sites) | |
272 | and a @emph{varpool} (a list of static and external variables in | |
273 | the program). | |
274 | ||
275 | The inter-procedural optimization is organized as a sequence of | |
276 | individual passes, which operate on the callgraph and the | |
277 | varpool. To make the implementation of WHOPR possible, every | |
278 | inter-procedural optimization pass is split into several stages | |
279 | that are executed at different times during WHOPR compilation: | |
280 | ||
281 | @itemize @bullet | |
282 | @item LGEN time | |
283 | @enumerate | |
284 | @item @emph{Generate summary} (@code{generate_summary} in | |
535b7874 | 285 | @code{struct ipa_opt_pass_d}). This stage analyzes every function |
3abe8cab JH |
286 | body and variable initializer is examined and stores relevant |
287 | information into a pass-specific data structure. | |
288 | ||
289 | @item @emph{Write summary} (@code{write_summary} in | |
4a087ccf | 290 | @code{struct ipa_opt_pass_d}). This stage writes all the |
3abe8cab JH |
291 | pass-specific information generated by @code{generate_summary}. |
292 | Summaries go into their own @code{LTO_section_*} sections that | |
293 | have to be declared in @file{lto-streamer.h}:@code{enum | |
294 | lto_section_type}. A new section is created by calling | |
295 | @code{create_output_block} and data can be written using the | |
296 | @code{lto_output_*} routines. | |
297 | @end enumerate | |
298 | ||
299 | @item WPA time | |
300 | @enumerate | |
301 | @item @emph{Read summary} (@code{read_summary} in | |
535b7874 | 302 | @code{struct ipa_opt_pass_d}). This stage reads all the |
3abe8cab JH |
303 | pass-specific information in exactly the same order that it was |
304 | written by @code{write_summary}. | |
305 | ||
306 | @item @emph{Execute} (@code{execute} in @code{struct | |
307 | opt_pass}). This performs inter-procedural propagation. This | |
308 | must be done without actual access to the individual function | |
309 | bodies or variable initializers. Typically, this results in a | |
310 | transitive closure operation over the summary information of all | |
311 | the nodes in the callgraph. | |
312 | ||
313 | @item @emph{Write optimization summary} | |
314 | (@code{write_optimization_summary} in @code{struct | |
315 | ipa_opt_pass_d}). This writes the result of the inter-procedural | |
316 | propagation into the object file. This can use the same data | |
317 | structures and helper routines used in @code{write_summary}. | |
318 | @end enumerate | |
319 | ||
320 | @item LTRANS time | |
321 | @enumerate | |
322 | @item @emph{Read optimization summary} | |
323 | (@code{read_optimization_summary} in @code{struct | |
324 | ipa_opt_pass_d}). The counterpart to | |
325 | @code{write_optimization_summary}. This reads the interprocedural | |
326 | optimization decisions in exactly the same format emitted by | |
327 | @code{write_optimization_summary}. | |
328 | ||
329 | @item @emph{Transform} (@code{function_transform} and | |
330 | @code{variable_transform} in @code{struct ipa_opt_pass_d}). | |
331 | The actual function bodies and variable initializers are updated | |
332 | based on the information passed down from the @emph{Execute} stage. | |
333 | @end enumerate | |
334 | @end itemize | |
335 | ||
336 | The implementation of the inter-procedural passes are shared | |
337 | between LTO, WHOPR and classic non-LTO compilation. | |
338 | ||
339 | @itemize | |
340 | @item During the traditional file-by-file mode every pass executes its | |
341 | own @emph{Generate summary}, @emph{Execute}, and @emph{Transform} | |
342 | stages within the single execution context of the compiler. | |
343 | ||
344 | @item In LTO compilation mode, every pass uses @emph{Generate | |
345 | summary} and @emph{Write summary} stages at compilation time, | |
346 | while the @emph{Read summary}, @emph{Execute}, and | |
347 | @emph{Transform} stages are executed at link time. | |
348 | ||
349 | @item In WHOPR mode all stages are used. | |
350 | @end itemize | |
351 | ||
352 | To simplify development, the GCC pass manager differentiates | |
8faf3ed9 XHL |
353 | between normal inter-procedural passes (@pxref{Regular IPA passes}), |
354 | small inter-procedural passes (@pxref{Small IPA passes}) | |
355 | and late inter-procedural passes (@pxref{Late IPA passes}). | |
356 | A small or late IPA pass (@code{SIMPLE_IPA_PASS}) does | |
357 | everything at once and thus cannot be executed during WPA in | |
535b7874 | 358 | WHOPR mode. It defines only the @emph{Execute} stage and during |
3abe8cab JH |
359 | this stage it accesses and modifies the function bodies. Such |
360 | passes are useful for optimization at LGEN or LTRANS time and are | |
361 | used, for example, to implement early optimization before writing | |
362 | object files. The simple inter-procedural passes can also be used | |
363 | for easier prototyping and development of a new inter-procedural | |
364 | pass. | |
365 | ||
366 | ||
367 | @subsection Virtual clones | |
368 | ||
369 | One of the main challenges of introducing the WHOPR compilation | |
370 | mode was addressing the interactions between optimization passes. | |
371 | In LTO compilation mode, the passes are executed in a sequence, | |
372 | each of which consists of analysis (or @emph{Generate summary}), | |
373 | propagation (or @emph{Execute}) and @emph{Transform} stages. | |
374 | Once the work of one pass is finished, the next pass sees the | |
375 | updated program representation and can execute. This makes the | |
376 | individual passes dependent on each other. | |
377 | ||
378 | In WHOPR mode all passes first execute their @emph{Generate | |
379 | summary} stage. Then summary writing marks the end of the LGEN | |
380 | stage. At WPA time, | |
381 | the summaries are read back into memory and all passes run the | |
382 | @emph{Execute} stage. Optimization summaries are streamed and | |
383 | sent to LTRANS, where all the passes execute the @emph{Transform} | |
384 | stage. | |
385 | ||
386 | Most optimization passes split naturally into analysis, | |
387 | propagation and transformation stages. But some do not. The | |
388 | main problem arises when one pass performs changes and the | |
389 | following pass gets confused by seeing different callgraphs | |
535b7874 | 390 | between the @emph{Transform} stage and the @emph{Generate summary} |
3abe8cab JH |
391 | or @emph{Execute} stage. This means that the passes are required |
392 | to communicate their decisions with each other. | |
393 | ||
394 | To facilitate this communication, the GCC callgraph | |
395 | infrastructure implements @emph{virtual clones}, a method of | |
396 | representing the changes performed by the optimization passes in | |
397 | the callgraph without needing to update function bodies. | |
398 | ||
399 | A @emph{virtual clone} in the callgraph is a function that has no | |
400 | associated body, just a description of how to create its body based | |
401 | on a different function (which itself may be a virtual clone). | |
402 | ||
403 | The description of function modifications includes adjustments to | |
404 | the function's signature (which allows, for example, removing or | |
405 | adding function arguments), substitutions to perform on the | |
406 | function body, and, for inlined functions, a pointer to the | |
407 | function that it will be inlined into. | |
408 | ||
409 | It is also possible to redirect any edge of the callgraph from a | |
410 | function to its virtual clone. This implies updating of the call | |
411 | site to adjust for the new function signature. | |
412 | ||
413 | Most of the transformations performed by inter-procedural | |
414 | optimizations can be represented via virtual clones. For | |
415 | instance, a constant propagation pass can produce a virtual clone | |
416 | of the function which replaces one of its arguments by a | |
417 | constant. The inliner can represent its decisions by producing a | |
418 | clone of a function whose body will be later integrated into | |
419 | a given function. | |
420 | ||
421 | Using @emph{virtual clones}, the program can be easily updated | |
422 | during the @emph{Execute} stage, solving most of pass interactions | |
423 | problems that would otherwise occur during @emph{Transform}. | |
424 | ||
425 | Virtual clones are later materialized in the LTRANS stage and | |
426 | turned into real functions. Passes executed after the virtual | |
427 | clone were introduced also perform their @emph{Transform} stage | |
428 | on new functions, so for a pass there is no significant | |
429 | difference between operating on a real function or a virtual | |
430 | clone introduced before its @emph{Execute} stage. | |
431 | ||
432 | Optimization passes then work on virtual clones introduced before | |
433 | their @emph{Execute} stage as if they were real functions. The | |
434 | only difference is that clones are not visible during the | |
435 | @emph{Generate Summary} stage. | |
436 | ||
437 | To keep function summaries updated, the callgraph interface | |
438 | allows an optimizer to register a callback that is called every | |
439 | time a new clone is introduced as well as when the actual | |
440 | function or variable is generated or when a function or variable | |
441 | is removed. These hooks are registered in the @emph{Generate | |
442 | summary} stage and allow the pass to keep its information intact | |
443 | until the @emph{Execute} stage. The same hooks can also be | |
444 | registered during the @emph{Execute} stage to keep the | |
445 | optimization summaries updated for the @emph{Transform} stage. | |
446 | ||
447 | @subsection IPA references | |
448 | ||
449 | GCC represents IPA references in the callgraph. For a function | |
450 | or variable @code{A}, the @emph{IPA reference} is a list of all | |
451 | locations where the address of @code{A} is taken and, when | |
452 | @code{A} is a variable, a list of all direct stores and reads | |
535b7874 | 453 | to/from @code{A}. References represent an oriented multi-graph on |
3abe8cab JH |
454 | the union of nodes of the callgraph and the varpool. See |
455 | @file{ipa-reference.c}:@code{ipa_reference_write_optimization_summary} | |
456 | and | |
457 | @file{ipa-reference.c}:@code{ipa_reference_read_optimization_summary} | |
458 | for details. | |
459 | ||
460 | @subsection Jump functions | |
461 | Suppose that an optimization pass sees a function @code{A} and it | |
462 | knows the values of (some of) its arguments. The @emph{jump | |
463 | function} describes the value of a parameter of a given function | |
464 | call in function @code{A} based on this knowledge. | |
465 | ||
466 | Jump functions are used by several optimizations, such as the | |
467 | inter-procedural constant propagation pass and the | |
468 | devirtualization pass. The inliner also uses jump functions to | |
469 | perform inlining of callbacks. | |
470 | ||
548e68fc | 471 | @node WHOPR |
3abe8cab JH |
472 | @section Whole program assumptions, linker plugin and symbol visibilities |
473 | ||
474 | Link-time optimization gives relatively minor benefits when used | |
475 | alone. The problem is that propagation of inter-procedural | |
476 | information does not work well across functions and variables | |
477 | that are called or referenced by other compilation units (such as | |
535b7874 | 478 | from a dynamically linked library). We say that such functions |
a77fa1fc | 479 | and variables are @emph{externally visible}. |
3abe8cab JH |
480 | |
481 | To make the situation even more difficult, many applications | |
482 | organize themselves as a set of shared libraries, and the default | |
483 | ELF visibility rules allow one to overwrite any externally | |
484 | visible symbol with a different symbol at runtime. This | |
485 | basically disables any optimizations across such functions and | |
486 | variables, because the compiler cannot be sure that the function | |
487 | body it is seeing is the same function body that will be used at | |
488 | runtime. Any function or variable not declared @code{static} in | |
489 | the sources degrades the quality of inter-procedural | |
490 | optimization. | |
491 | ||
492 | To avoid this problem the compiler must assume that it sees the | |
493 | whole program when doing link-time optimization. Strictly | |
494 | speaking, the whole program is rarely visible even at link-time. | |
495 | Standard system libraries are usually linked dynamically or not | |
496 | provided with the link-time information. In GCC, the whole | |
497 | program option (@option{-fwhole-program}) asserts that every | |
498 | function and variable defined in the current compilation | |
499 | unit is static, except for function @code{main} (note: at | |
535b7874 | 500 | link time, the current unit is the union of all objects compiled |
3abe8cab JH |
501 | with LTO). Since some functions and variables need to |
502 | be referenced externally, for example by another DSO or from an | |
503 | assembler file, GCC also provides the function and variable | |
504 | attribute @code{externally_visible} which can be used to disable | |
505 | the effect of @option{-fwhole-program} on a specific symbol. | |
506 | ||
507 | The whole program mode assumptions are slightly more complex in | |
508 | C++, where inline functions in headers are put into @emph{COMDAT} | |
535b7874 | 509 | sections. COMDAT function and variables can be defined by |
3abe8cab JH |
510 | multiple object files and their bodies are unified at link-time |
511 | and dynamic link-time. COMDAT functions are changed to local only | |
512 | when their address is not taken and thus un-sharing them with a | |
513 | library is not harmful. COMDAT variables always remain externally | |
514 | visible, however for readonly variables it is assumed that their | |
515 | initializers cannot be overwritten by a different value. | |
516 | ||
517 | GCC provides the function and variable attribute | |
518 | @code{visibility} that can be used to specify the visibility of | |
519 | externally visible symbols (or alternatively an | |
520 | @option{-fdefault-visibility} command line option). ELF defines | |
521 | the @code{default}, @code{protected}, @code{hidden} and | |
522 | @code{internal} visibilities. | |
523 | ||
535b7874 | 524 | The most commonly used is visibility is @code{hidden}. It |
3abe8cab | 525 | specifies that the symbol cannot be referenced from outside of |
535b7874 | 526 | the current shared library. Unfortunately, this information |
3abe8cab JH |
527 | cannot be used directly by the link-time optimization in the |
528 | compiler since the whole shared library also might contain | |
529 | non-LTO objects and those are not visible to the compiler. | |
530 | ||
531 | GCC solves this problem using linker plugins. A @emph{linker | |
532 | plugin} is an interface to the linker that allows an external | |
533 | program to claim the ownership of a given object file. The linker | |
534 | then performs the linking procedure by querying the plugin about | |
535 | the symbol table of the claimed objects and once the linking | |
536 | decisions are complete, the plugin is allowed to provide the | |
537 | final object file before the actual linking is made. The linker | |
538 | plugin obtains the symbol resolution information which specifies | |
539 | which symbols provided by the claimed objects are bound from the | |
540 | rest of a binary being linked. | |
541 | ||
3abe8cab JH |
542 | GCC is designed to be independent of the rest of the toolchain |
543 | and aims to support linkers without plugin support. For this | |
544 | reason it does not use the linker plugin by default. Instead, | |
545 | the object files are examined by @command{collect2} before being | |
546 | passed to the linker and objects found to have LTO sections are | |
547 | passed to @command{lto1} first. This mode does not work for | |
535b7874 | 548 | library archives. The decision on what object files from the |
3abe8cab JH |
549 | archive are needed depends on the actual linking and thus GCC |
550 | would have to implement the linker itself. The resolution | |
551 | information is missing too and thus GCC needs to make an educated | |
552 | guess based on @option{-fwhole-program}. Without the linker | |
553 | plugin GCC also assumes that symbols are declared @code{hidden} | |
554 | and not referred by non-LTO code by default. | |
555 | ||
548e68fc | 556 | @node Internal flags |
3abe8cab JH |
557 | @section Internal flags controlling @code{lto1} |
558 | ||
559 | The following flags are passed into @command{lto1} and are not | |
560 | meant to be used directly from the command line. | |
561 | ||
562 | @itemize | |
563 | @item -fwpa | |
564 | @opindex fwpa | |
565 | This option runs the serial part of the link-time optimizer | |
566 | performing the inter-procedural propagation (WPA mode). The | |
567 | compiler reads in summary information from all inputs and | |
568 | performs an analysis based on summary information only. It | |
569 | generates object files for subsequent runs of the link-time | |
570 | optimizer where individual object files are optimized using both | |
571 | summary information from the WPA mode and the actual function | |
572 | bodies. It then drives the LTRANS phase. | |
573 | ||
574 | @item -fltrans | |
575 | @opindex fltrans | |
576 | This option runs the link-time optimizer in the | |
577 | local-transformation (LTRANS) mode, which reads in output from a | |
535b7874 | 578 | previous run of the LTO in WPA mode. In the LTRANS mode, LTO |
3abe8cab JH |
579 | optimizes an object and produces the final assembly. |
580 | ||
581 | @item -fltrans-output-list=@var{file} | |
582 | @opindex fltrans-output-list | |
583 | This option specifies a file to which the names of LTRANS output | |
584 | files are written. This option is only meaningful in conjunction | |
585 | with @option{-fwpa}. | |
c2679d84 RB |
586 | |
587 | @item -fresolution=@var{file} | |
588 | @opindex fresolution | |
589 | This option specifies the linker resolution file. This option is | |
590 | only meaningful in conjunction with @option{-fwpa} and as option | |
6404e190 | 591 | to pass through to the LTO linker plugin. |
3abe8cab | 592 | @end itemize |