]> git.ipfire.org Git - thirdparty/gcc.git/blame - gcc/doc/lto.texi
Update copyright years.
[thirdparty/gcc.git] / gcc / doc / lto.texi
CommitLineData
99dee823 1@c Copyright (C) 2010-2021 Free Software Foundation, Inc.
3abe8cab
JH
2@c This is part of the GCC manual.
3@c For copying conditions, see the file gcc.texi.
4@c Contributed by Jan Hubicka <jh@suse.cz> and
5@c Diego Novillo <dnovillo@google.com>
6
7@node LTO
8@chapter Link Time Optimization
9@cindex lto
10@cindex whopr
11@cindex wpa
12@cindex ltrans
13
548e68fc
JL
14Link Time Optimization (LTO) gives GCC the capability of
15dumping its internal representation (GIMPLE) to disk,
16so that all the different compilation units that make up
17a single executable can be optimized as a single module.
18This expands the scope of inter-procedural optimizations
19to encompass the whole program (or, rather, everything
20that is visible at link time).
21
22@menu
23* LTO Overview:: Overview of LTO.
24* LTO object file layout:: LTO file sections in ELF.
25* IPA:: Using summary information in IPA passes.
26* WHOPR:: Whole program assumptions,
27 linker plugin and symbol visibilities.
28* Internal flags:: Internal flags controlling @code{lto1}.
29@end menu
30
31@node LTO Overview
3abe8cab
JH
32@section Design Overview
33
34Link time optimization is implemented as a GCC front end for a
35bytecode representation of GIMPLE that is emitted in special sections
36of @code{.o} files. Currently, LTO support is enabled in most
37ELF-based systems, as well as darwin, cygwin and mingw systems.
38
39Since GIMPLE bytecode is saved alongside final object code, object
40files generated with LTO support are larger than regular object files.
41This ``fat'' object format makes it easy to integrate LTO into
42existing build systems, as one can, for instance, produce archives of
43the files. Additionally, one might be able to ship one set of fat
44objects which could be used both for development and the production of
45optimized builds. A, perhaps surprising, side effect of this feature
0f605f72 46is that any mistake in the toolchain leads to LTO information not
535b7874 47being used (e.g.@: an older @code{libtool} calling @code{ld} directly).
3abe8cab
JH
48This is both an advantage, as the system is more robust, and a
49disadvantage, as the user is not informed that the optimization has
50been disabled.
51
52The current implementation only produces ``fat'' objects, effectively
53doubling compilation time and increasing file sizes up to 5x the
54original size. This hides the problem that some tools, such as
55@code{ar} and @code{nm}, need to understand symbol tables of LTO
56sections. These tools were extended to use the plugin infrastructure,
57and with these problems solved, GCC will also support ``slim'' objects
58consisting of the intermediate code alone.
59
60At the highest level, LTO splits the compiler in two. The first half
61(the ``writer'') produces a streaming representation of all the
62internal data structures needed to optimize and generate code. This
63includes declarations, types, the callgraph and the GIMPLE representation
64of function bodies.
65
66When @option{-flto} is given during compilation of a source file, the
67pass manager executes all the passes in @code{all_lto_gen_passes}.
68Currently, this phase is composed of two IPA passes:
69
70@itemize @bullet
71@item @code{pass_ipa_lto_gimple_out}
72This pass executes the function @code{lto_output} in
73@file{lto-streamer-out.c}, which traverses the call graph encoding
535b7874 74every reachable declaration, type and function. This generates a
3abe8cab
JH
75memory representation of all the file sections described below.
76
77@item @code{pass_ipa_lto_finish_out}
78This pass executes the function @code{produce_asm_for_decls} in
79@file{lto-streamer-out.c}, which takes the memory image built in the
80previous pass and encodes it in the corresponding ELF file sections.
81@end itemize
82
83The second half of LTO support is the ``reader''. This is implemented
84as the GCC front end @file{lto1} in @file{lto/lto.c}. When
85@file{collect2} detects a link set of @code{.o}/@code{.a} files with
86LTO information and the @option{-flto} is enabled, it invokes
87@file{lto1} which reads the set of files and aggregates them into a
88single translation unit for optimization. The main entry point for
89the reader is @file{lto/lto.c}:@code{lto_main}.
90
91@subsection LTO modes of operation
92
93One of the main goals of the GCC link-time infrastructure was to allow
94effective compilation of large programs. For this reason GCC implements two
95link-time compilation modes.
96
97@enumerate
98@item @emph{LTO mode}, in which the whole program is read into the
99compiler at link-time and optimized in a similar way as if it
100were a single source-level compilation unit.
101
102@item @emph{WHOPR or partitioned mode}, designed to utilize multiple
103CPUs and/or a distributed compilation environment to quickly link
104large applications. WHOPR stands for WHOle Program optimizeR (not to
105be confused with the semantics of @option{-fwhole-program}). It
106partitions the aggregated callgraph from many different @code{.o}
107files and distributes the compilation of the sub-graphs to different
108CPUs.
109
110Note that distributed compilation is not implemented yet, but since
111the parallelism is facilitated via generating a @code{Makefile}, it
112would be easy to implement.
113@end enumerate
114
115WHOPR splits LTO into three main stages:
116@enumerate
117@item Local generation (LGEN)
535b7874 118This stage executes in parallel. Every file in the program is compiled
3abe8cab
JH
119into the intermediate language and packaged together with the local
120call-graph and summary information. This stage is the same for both
121the LTO and WHOPR compilation mode.
122
123@item Whole Program Analysis (WPA)
535b7874
RW
124WPA is performed sequentially. The global call-graph is generated, and
125a global analysis procedure makes transformation decisions. The global
3abe8cab 126call-graph is partitioned to facilitate parallel optimization during
535b7874 127phase 3. The results of the WPA stage are stored into new object files
3abe8cab
JH
128which contain the partitions of program expressed in the intermediate
129language and the optimization decisions.
130
131@item Local transformations (LTRANS)
535b7874 132This stage executes in parallel. All the decisions made during phase 2
3abe8cab 133are implemented locally in each partitioned object file, and the final
535b7874 134object code is generated. Optimizations which cannot be decided
3abe8cab
JH
135efficiently during the phase 2 may be performed on the local
136call-graph partitions.
137@end enumerate
138
139WHOPR can be seen as an extension of the usual LTO mode of
535b7874 140compilation. In LTO, WPA and LTRANS are executed within a single
3abe8cab
JH
141execution of the compiler, after the whole program has been read into
142memory.
143
535b7874 144When compiling in WHOPR mode, the callgraph is partitioned during
3abe8cab
JH
145the WPA stage. The whole program is split into a given number of
146partitions of roughly the same size. The compiler tries to
147minimize the number of references which cross partition boundaries.
148The main advantage of WHOPR is to allow the parallel execution of
149LTRANS stages, which are the most time-consuming part of the
150compilation process. Additionally, it avoids the need to load the
151whole program into memory.
152
153
548e68fc 154@node LTO object file layout
3abe8cab
JH
155@section LTO file sections
156
157LTO information is stored in several ELF sections inside object files.
158Data structures and enum codes for sections are defined in
159@file{lto-streamer.h}.
160
161These sections are emitted from @file{lto-streamer-out.c} and mapped
162in all at once from @file{lto/lto.c}:@code{lto_file_read}. The
163individual functions dealing with the reading/writing of each section
164are described below.
165
166@itemize @bullet
167@item Command line options (@code{.gnu.lto_.opts})
168
169This section contains the command line options used to generate the
535b7874 170object files. This is used at link time to determine the optimization
3abe8cab
JH
171level and other settings when they are not explicitly specified at the
172linker command line.
173
174Currently, GCC does not support combining LTO object files compiled
175with different set of the command line options into a single binary.
535b7874 176At link time, the options given on the command line and the options
3abe8cab
JH
177saved on all the files in a link-time set are applied globally. No
178attempt is made at validating the combination of flags (other than the
179usual validation done by option processing). This is implemented in
180@file{lto/lto.c}:@code{lto_read_all_file_options}.
181
182
183@item Symbol table (@code{.gnu.lto_.symtab})
184
185This table replaces the ELF symbol table for functions and variables
535b7874 186represented in the LTO IL. Symbols used and exported by the optimized
3abe8cab
JH
187assembly code of ``fat'' objects might not match the ones used and
188exported by the intermediate code. This table is necessary because
189the intermediate code is less optimized and thus requires a separate
190symbol table.
191
192Additionally, the binary code in the ``fat'' object will lack a call
193to a function, since the call was optimized out at compilation time
194after the intermediate language was streamed out. In some special
535b7874 195cases, the same optimization may not happen during link-time
3abe8cab
JH
196optimization. This would lead to an undefined symbol if only one
197symbol table was used.
198
199The symbol table is emitted in
200@file{lto-streamer-out.c}:@code{produce_symtab}.
201
202
203@item Global declarations and types (@code{.gnu.lto_.decls})
204
205This section contains an intermediate language dump of all
206declarations and types required to represent the callgraph, static
207variables and top-level debug info.
208
209The contents of this section are emitted in
210@file{lto-streamer-out.c}:@code{produce_asm_for_decls}. Types and
211symbols are emitted in a topological order that preserves the sharing
212of pointers when the file is read back in
213(@file{lto.c}:@code{read_cgraph_and_symbols}).
214
215
216@item The callgraph (@code{.gnu.lto_.cgraph})
217
218This section contains the basic data structure used by the GCC
535b7874 219inter-procedural optimization infrastructure. This section stores an
3abe8cab
JH
220annotated multi-graph which represents the functions and call sites as
221well as the variables, aliases and top-level @code{asm} statements.
222
223This section is emitted in
224@file{lto-streamer-out.c}:@code{output_cgraph} and read in
225@file{lto-cgraph.c}:@code{input_cgraph}.
226
227
228@item IPA references (@code{.gnu.lto_.refs})
229
230This section contains references between function and static
231variables. It is emitted by @file{lto-cgraph.c}:@code{output_refs}
232and read by @file{lto-cgraph.c}:@code{input_refs}.
233
234
235@item Function bodies (@code{.gnu.lto_.function_body.<name>})
236
237This section contains function bodies in the intermediate language
535b7874 238representation. Every function body is in a separate section to allow
3abe8cab
JH
239copying of the section independently to different object files or
240reading the function on demand.
241
242Functions are emitted in
243@file{lto-streamer-out.c}:@code{output_function} and read in
244@file{lto-streamer-in.c}:@code{input_function}.
245
246
247@item Static variable initializers (@code{.gnu.lto_.vars})
248
249This section contains all the symbols in the global variable pool. It
250is emitted by @file{lto-cgraph.c}:@code{output_varpool} and read in
251@file{lto-cgraph.c}:@code{input_cgraph}.
252
253@item Summaries and optimization summaries used by IPA passes
254(@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs},
255@code{pureconst} or @code{reference})
256
257These sections are used by IPA passes that need to emit summary
258information during LTO generation to be read and aggregated at
259link time. Each pass is responsible for implementing two pass manager
260hooks: one for writing the summary and another for reading it in. The
261format of these sections is entirely up to each individual pass. The
262only requirement is that the writer and reader hooks agree on the
263format.
264@end itemize
265
266
548e68fc 267@node IPA
3abe8cab
JH
268@section Using summary information in IPA passes
269
270Programs are represented internally as a @emph{callgraph} (a
271multi-graph where nodes are functions and edges are call sites)
272and a @emph{varpool} (a list of static and external variables in
273the program).
274
275The inter-procedural optimization is organized as a sequence of
276individual passes, which operate on the callgraph and the
277varpool. To make the implementation of WHOPR possible, every
278inter-procedural optimization pass is split into several stages
279that are executed at different times during WHOPR compilation:
280
281@itemize @bullet
282@item LGEN time
283@enumerate
284@item @emph{Generate summary} (@code{generate_summary} in
535b7874 285@code{struct ipa_opt_pass_d}). This stage analyzes every function
3abe8cab
JH
286body and variable initializer is examined and stores relevant
287information into a pass-specific data structure.
288
289@item @emph{Write summary} (@code{write_summary} in
4a087ccf 290@code{struct ipa_opt_pass_d}). This stage writes all the
3abe8cab
JH
291pass-specific information generated by @code{generate_summary}.
292Summaries go into their own @code{LTO_section_*} sections that
293have to be declared in @file{lto-streamer.h}:@code{enum
294lto_section_type}. A new section is created by calling
295@code{create_output_block} and data can be written using the
296@code{lto_output_*} routines.
297@end enumerate
298
299@item WPA time
300@enumerate
301@item @emph{Read summary} (@code{read_summary} in
535b7874 302@code{struct ipa_opt_pass_d}). This stage reads all the
3abe8cab
JH
303pass-specific information in exactly the same order that it was
304written by @code{write_summary}.
305
306@item @emph{Execute} (@code{execute} in @code{struct
307opt_pass}). This performs inter-procedural propagation. This
308must be done without actual access to the individual function
309bodies or variable initializers. Typically, this results in a
310transitive closure operation over the summary information of all
311the nodes in the callgraph.
312
313@item @emph{Write optimization summary}
314(@code{write_optimization_summary} in @code{struct
315ipa_opt_pass_d}). This writes the result of the inter-procedural
316propagation into the object file. This can use the same data
317structures and helper routines used in @code{write_summary}.
318@end enumerate
319
320@item LTRANS time
321@enumerate
322@item @emph{Read optimization summary}
323(@code{read_optimization_summary} in @code{struct
324ipa_opt_pass_d}). The counterpart to
325@code{write_optimization_summary}. This reads the interprocedural
326optimization decisions in exactly the same format emitted by
327@code{write_optimization_summary}.
328
329@item @emph{Transform} (@code{function_transform} and
330@code{variable_transform} in @code{struct ipa_opt_pass_d}).
331The actual function bodies and variable initializers are updated
332based on the information passed down from the @emph{Execute} stage.
333@end enumerate
334@end itemize
335
336The implementation of the inter-procedural passes are shared
337between LTO, WHOPR and classic non-LTO compilation.
338
339@itemize
340@item During the traditional file-by-file mode every pass executes its
341own @emph{Generate summary}, @emph{Execute}, and @emph{Transform}
342stages within the single execution context of the compiler.
343
344@item In LTO compilation mode, every pass uses @emph{Generate
345summary} and @emph{Write summary} stages at compilation time,
346while the @emph{Read summary}, @emph{Execute}, and
347@emph{Transform} stages are executed at link time.
348
349@item In WHOPR mode all stages are used.
350@end itemize
351
352To simplify development, the GCC pass manager differentiates
8faf3ed9
XHL
353between normal inter-procedural passes (@pxref{Regular IPA passes}),
354small inter-procedural passes (@pxref{Small IPA passes})
355and late inter-procedural passes (@pxref{Late IPA passes}).
356A small or late IPA pass (@code{SIMPLE_IPA_PASS}) does
357everything at once and thus cannot be executed during WPA in
535b7874 358WHOPR mode. It defines only the @emph{Execute} stage and during
3abe8cab
JH
359this stage it accesses and modifies the function bodies. Such
360passes are useful for optimization at LGEN or LTRANS time and are
361used, for example, to implement early optimization before writing
362object files. The simple inter-procedural passes can also be used
363for easier prototyping and development of a new inter-procedural
364pass.
365
366
367@subsection Virtual clones
368
369One of the main challenges of introducing the WHOPR compilation
370mode was addressing the interactions between optimization passes.
371In LTO compilation mode, the passes are executed in a sequence,
372each of which consists of analysis (or @emph{Generate summary}),
373propagation (or @emph{Execute}) and @emph{Transform} stages.
374Once the work of one pass is finished, the next pass sees the
375updated program representation and can execute. This makes the
376individual passes dependent on each other.
377
378In WHOPR mode all passes first execute their @emph{Generate
379summary} stage. Then summary writing marks the end of the LGEN
380stage. At WPA time,
381the summaries are read back into memory and all passes run the
382@emph{Execute} stage. Optimization summaries are streamed and
383sent to LTRANS, where all the passes execute the @emph{Transform}
384stage.
385
386Most optimization passes split naturally into analysis,
387propagation and transformation stages. But some do not. The
388main problem arises when one pass performs changes and the
389following pass gets confused by seeing different callgraphs
535b7874 390between the @emph{Transform} stage and the @emph{Generate summary}
3abe8cab
JH
391or @emph{Execute} stage. This means that the passes are required
392to communicate their decisions with each other.
393
394To facilitate this communication, the GCC callgraph
395infrastructure implements @emph{virtual clones}, a method of
396representing the changes performed by the optimization passes in
397the callgraph without needing to update function bodies.
398
399A @emph{virtual clone} in the callgraph is a function that has no
400associated body, just a description of how to create its body based
401on a different function (which itself may be a virtual clone).
402
403The description of function modifications includes adjustments to
404the function's signature (which allows, for example, removing or
405adding function arguments), substitutions to perform on the
406function body, and, for inlined functions, a pointer to the
407function that it will be inlined into.
408
409It is also possible to redirect any edge of the callgraph from a
410function to its virtual clone. This implies updating of the call
411site to adjust for the new function signature.
412
413Most of the transformations performed by inter-procedural
414optimizations can be represented via virtual clones. For
415instance, a constant propagation pass can produce a virtual clone
416of the function which replaces one of its arguments by a
417constant. The inliner can represent its decisions by producing a
418clone of a function whose body will be later integrated into
419a given function.
420
421Using @emph{virtual clones}, the program can be easily updated
422during the @emph{Execute} stage, solving most of pass interactions
423problems that would otherwise occur during @emph{Transform}.
424
425Virtual clones are later materialized in the LTRANS stage and
426turned into real functions. Passes executed after the virtual
427clone were introduced also perform their @emph{Transform} stage
428on new functions, so for a pass there is no significant
429difference between operating on a real function or a virtual
430clone introduced before its @emph{Execute} stage.
431
432Optimization passes then work on virtual clones introduced before
433their @emph{Execute} stage as if they were real functions. The
434only difference is that clones are not visible during the
435@emph{Generate Summary} stage.
436
437To keep function summaries updated, the callgraph interface
438allows an optimizer to register a callback that is called every
439time a new clone is introduced as well as when the actual
440function or variable is generated or when a function or variable
441is removed. These hooks are registered in the @emph{Generate
442summary} stage and allow the pass to keep its information intact
443until the @emph{Execute} stage. The same hooks can also be
444registered during the @emph{Execute} stage to keep the
445optimization summaries updated for the @emph{Transform} stage.
446
447@subsection IPA references
448
449GCC represents IPA references in the callgraph. For a function
450or variable @code{A}, the @emph{IPA reference} is a list of all
451locations where the address of @code{A} is taken and, when
452@code{A} is a variable, a list of all direct stores and reads
535b7874 453to/from @code{A}. References represent an oriented multi-graph on
3abe8cab
JH
454the union of nodes of the callgraph and the varpool. See
455@file{ipa-reference.c}:@code{ipa_reference_write_optimization_summary}
456and
457@file{ipa-reference.c}:@code{ipa_reference_read_optimization_summary}
458for details.
459
460@subsection Jump functions
461Suppose that an optimization pass sees a function @code{A} and it
462knows the values of (some of) its arguments. The @emph{jump
463function} describes the value of a parameter of a given function
464call in function @code{A} based on this knowledge.
465
466Jump functions are used by several optimizations, such as the
467inter-procedural constant propagation pass and the
468devirtualization pass. The inliner also uses jump functions to
469perform inlining of callbacks.
470
548e68fc 471@node WHOPR
3abe8cab
JH
472@section Whole program assumptions, linker plugin and symbol visibilities
473
474Link-time optimization gives relatively minor benefits when used
475alone. The problem is that propagation of inter-procedural
476information does not work well across functions and variables
477that are called or referenced by other compilation units (such as
535b7874 478from a dynamically linked library). We say that such functions
a77fa1fc 479and variables are @emph{externally visible}.
3abe8cab
JH
480
481To make the situation even more difficult, many applications
482organize themselves as a set of shared libraries, and the default
483ELF visibility rules allow one to overwrite any externally
484visible symbol with a different symbol at runtime. This
485basically disables any optimizations across such functions and
486variables, because the compiler cannot be sure that the function
487body it is seeing is the same function body that will be used at
488runtime. Any function or variable not declared @code{static} in
489the sources degrades the quality of inter-procedural
490optimization.
491
492To avoid this problem the compiler must assume that it sees the
493whole program when doing link-time optimization. Strictly
494speaking, the whole program is rarely visible even at link-time.
495Standard system libraries are usually linked dynamically or not
496provided with the link-time information. In GCC, the whole
497program option (@option{-fwhole-program}) asserts that every
498function and variable defined in the current compilation
499unit is static, except for function @code{main} (note: at
535b7874 500link time, the current unit is the union of all objects compiled
3abe8cab
JH
501with LTO). Since some functions and variables need to
502be referenced externally, for example by another DSO or from an
503assembler file, GCC also provides the function and variable
504attribute @code{externally_visible} which can be used to disable
505the effect of @option{-fwhole-program} on a specific symbol.
506
507The whole program mode assumptions are slightly more complex in
508C++, where inline functions in headers are put into @emph{COMDAT}
535b7874 509sections. COMDAT function and variables can be defined by
3abe8cab
JH
510multiple object files and their bodies are unified at link-time
511and dynamic link-time. COMDAT functions are changed to local only
512when their address is not taken and thus un-sharing them with a
513library is not harmful. COMDAT variables always remain externally
514visible, however for readonly variables it is assumed that their
515initializers cannot be overwritten by a different value.
516
517GCC provides the function and variable attribute
518@code{visibility} that can be used to specify the visibility of
519externally visible symbols (or alternatively an
520@option{-fdefault-visibility} command line option). ELF defines
521the @code{default}, @code{protected}, @code{hidden} and
522@code{internal} visibilities.
523
535b7874 524The most commonly used is visibility is @code{hidden}. It
3abe8cab 525specifies that the symbol cannot be referenced from outside of
535b7874 526the current shared library. Unfortunately, this information
3abe8cab
JH
527cannot be used directly by the link-time optimization in the
528compiler since the whole shared library also might contain
529non-LTO objects and those are not visible to the compiler.
530
531GCC solves this problem using linker plugins. A @emph{linker
532plugin} is an interface to the linker that allows an external
533program to claim the ownership of a given object file. The linker
534then performs the linking procedure by querying the plugin about
535the symbol table of the claimed objects and once the linking
536decisions are complete, the plugin is allowed to provide the
537final object file before the actual linking is made. The linker
538plugin obtains the symbol resolution information which specifies
539which symbols provided by the claimed objects are bound from the
540rest of a binary being linked.
541
3abe8cab
JH
542GCC is designed to be independent of the rest of the toolchain
543and aims to support linkers without plugin support. For this
544reason it does not use the linker plugin by default. Instead,
545the object files are examined by @command{collect2} before being
546passed to the linker and objects found to have LTO sections are
547passed to @command{lto1} first. This mode does not work for
535b7874 548library archives. The decision on what object files from the
3abe8cab
JH
549archive are needed depends on the actual linking and thus GCC
550would have to implement the linker itself. The resolution
551information is missing too and thus GCC needs to make an educated
552guess based on @option{-fwhole-program}. Without the linker
553plugin GCC also assumes that symbols are declared @code{hidden}
554and not referred by non-LTO code by default.
555
548e68fc 556@node Internal flags
3abe8cab
JH
557@section Internal flags controlling @code{lto1}
558
559The following flags are passed into @command{lto1} and are not
560meant to be used directly from the command line.
561
562@itemize
563@item -fwpa
564@opindex fwpa
565This option runs the serial part of the link-time optimizer
566performing the inter-procedural propagation (WPA mode). The
567compiler reads in summary information from all inputs and
568performs an analysis based on summary information only. It
569generates object files for subsequent runs of the link-time
570optimizer where individual object files are optimized using both
571summary information from the WPA mode and the actual function
572bodies. It then drives the LTRANS phase.
573
574@item -fltrans
575@opindex fltrans
576This option runs the link-time optimizer in the
577local-transformation (LTRANS) mode, which reads in output from a
535b7874 578previous run of the LTO in WPA mode. In the LTRANS mode, LTO
3abe8cab
JH
579optimizes an object and produces the final assembly.
580
581@item -fltrans-output-list=@var{file}
582@opindex fltrans-output-list
583This option specifies a file to which the names of LTRANS output
584files are written. This option is only meaningful in conjunction
585with @option{-fwpa}.
c2679d84
RB
586
587@item -fresolution=@var{file}
588@opindex fresolution
589This option specifies the linker resolution file. This option is
590only meaningful in conjunction with @option{-fwpa} and as option
6404e190 591to pass through to the LTO linker plugin.
3abe8cab 592@end itemize