[thirdparty/gcc.git] / gcc / doc / gccint / link-time-optimization / using-summary-information-in-ipa-passes.rst

..
  Copyright 1988-2022 Free Software Foundation, Inc.
  This is part of the GCC manual.
  For copying conditions, see the copyright.rst file.

.. _ipa:

Using summary information in IPA passes
***************************************

Programs are represented internally as a *callgraph* (a
multi-graph where nodes are functions and edges are call sites)
and a *varpool* (a list of static and external variables in
the program).

The inter-procedural optimization is organized as a sequence of
individual passes, which operate on the callgraph and the
varpool.  To make the implementation of WHOPR possible, every
inter-procedural optimization pass is split into several stages
that are executed at different times during WHOPR compilation:

* LGEN time

  * *Generate summary* (``generate_summary`` in
    ``struct ipa_opt_pass_d``).  This stage analyzes every function
    body and variable initializer is examined and stores relevant
    information into a pass-specific data structure.

  * *Write summary* (``write_summary`` in
    ``struct ipa_opt_pass_d``).  This stage writes all the
    pass-specific information generated by ``generate_summary``.
    Summaries go into their own ``LTO_section_*`` sections that
    have to be declared in :samp:`lto-streamer.h`: ``enum
    lto_section_type``.  A new section is created by calling
    ``create_output_block`` and data can be written using the
    ``lto_output_*`` routines.

* WPA time

  * *Read summary* (``read_summary`` in
    ``struct ipa_opt_pass_d``).  This stage reads all the
    pass-specific information in exactly the same order that it was
    written by ``write_summary``.

  * *Execute* (``execute`` in ``struct
    opt_pass``).  This performs inter-procedural propagation.  This
    must be done without actual access to the individual function
    bodies or variable initializers.  Typically, this results in a
    transitive closure operation over the summary information of all
    the nodes in the callgraph.

  * *Write optimization summary*
    (``write_optimization_summary`` in ``struct
    ipa_opt_pass_d``).  This writes the result of the inter-procedural
    propagation into the object file.  This can use the same data
    structures and helper routines used in ``write_summary``.

* LTRANS time

  * *Read optimization summary*
    (``read_optimization_summary`` in ``struct
    ipa_opt_pass_d``).  The counterpart to
    ``write_optimization_summary``.  This reads the interprocedural
    optimization decisions in exactly the same format emitted by
    ``write_optimization_summary``.

  * *Transform* (``function_transform`` and
    ``variable_transform`` in ``struct ipa_opt_pass_d``).
    The actual function bodies and variable initializers are updated
    based on the information passed down from the *Execute* stage.

The implementation of the inter-procedural passes are shared
between LTO, WHOPR and classic non-LTO compilation.

* During the traditional file-by-file mode every pass executes its
  own *Generate summary*, *Execute*, and *Transform*
  stages within the single execution context of the compiler.

* In LTO compilation mode, every pass uses *Generate
  summary* and *Write summary* stages at compilation time,
  while the *Read summary*, *Execute*, and
  *Transform* stages are executed at link time.

* In WHOPR mode all stages are used.

To simplify development, the GCC pass manager differentiates
between normal inter-procedural passes (see :ref:`regular-ipa-passes`),
small inter-procedural passes (see :ref:`small-ipa-passes`)
and late inter-procedural passes (see :ref:`late-ipa-passes`).
A small or late IPA pass (``SIMPLE_IPA_PASS``) does
everything at once and thus cannot be executed during WPA in
WHOPR mode.  It defines only the *Execute* stage and during
this stage it accesses and modifies the function bodies.  Such
passes are useful for optimization at LGEN or LTRANS time and are
used, for example, to implement early optimization before writing
object files.  The simple inter-procedural passes can also be used
for easier prototyping and development of a new inter-procedural
pass.

Virtual clones
^^^^^^^^^^^^^^

One of the main challenges of introducing the WHOPR compilation
mode was addressing the interactions between optimization passes.
In LTO compilation mode, the passes are executed in a sequence,
each of which consists of analysis (or *Generate summary*),
propagation (or *Execute*) and *Transform* stages.
Once the work of one pass is finished, the next pass sees the
updated program representation and can execute.  This makes the
individual passes dependent on each other.

In WHOPR mode all passes first execute their *Generate
summary* stage.  Then summary writing marks the end of the LGEN
stage.  At WPA time,
the summaries are read back into memory and all passes run the
*Execute* stage.  Optimization summaries are streamed and
sent to LTRANS, where all the passes execute the *Transform*
stage.

Most optimization passes split naturally into analysis,
propagation and transformation stages.  But some do not.  The
main problem arises when one pass performs changes and the
following pass gets confused by seeing different callgraphs
between the *Transform* stage and the *Generate summary*
or *Execute* stage.  This means that the passes are required
to communicate their decisions with each other.

To facilitate this communication, the GCC callgraph
infrastructure implements *virtual clones*, a method of
representing the changes performed by the optimization passes in
the callgraph without needing to update function bodies.

A *virtual clone* in the callgraph is a function that has no
associated body, just a description of how to create its body based
on a different function (which itself may be a virtual clone).

The description of function modifications includes adjustments to
the function's signature (which allows, for example, removing or
adding function arguments), substitutions to perform on the
function body, and, for inlined functions, a pointer to the
function that it will be inlined into.

It is also possible to redirect any edge of the callgraph from a
function to its virtual clone.  This implies updating of the call
site to adjust for the new function signature.

Most of the transformations performed by inter-procedural
optimizations can be represented via virtual clones.  For
instance, a constant propagation pass can produce a virtual clone
of the function which replaces one of its arguments by a
constant.  The inliner can represent its decisions by producing a
clone of a function whose body will be later integrated into
a given function.

Using *virtual clones*, the program can be easily updated
during the *Execute* stage, solving most of pass interactions
problems that would otherwise occur during *Transform*.

Virtual clones are later materialized in the LTRANS stage and
turned into real functions.  Passes executed after the virtual
clone were introduced also perform their *Transform* stage
on new functions, so for a pass there is no significant
difference between operating on a real function or a virtual
clone introduced before its *Execute* stage.

Optimization passes then work on virtual clones introduced before
their *Execute* stage as if they were real functions.  The
only difference is that clones are not visible during the
*Generate Summary* stage.

To keep function summaries updated, the callgraph interface
allows an optimizer to register a callback that is called every
time a new clone is introduced as well as when the actual
function or variable is generated or when a function or variable
is removed.  These hooks are registered in the *Generate
summary* stage and allow the pass to keep its information intact
until the *Execute* stage.  The same hooks can also be
registered during the *Execute* stage to keep the
optimization summaries updated for the *Transform* stage.

IPA references
^^^^^^^^^^^^^^

GCC represents IPA references in the callgraph.  For a function
or variable ``A``, the *IPA reference* is a list of all
locations where the address of ``A`` is taken and, when
``A`` is a variable, a list of all direct stores and reads
to/from ``A``.  References represent an oriented multi-graph on
the union of nodes of the callgraph and the varpool.  See
:samp:`ipa-reference.cc`: ``ipa_reference_write_optimization_summary``
and
:samp:`ipa-reference.cc`: ``ipa_reference_read_optimization_summary``
for details.

Jump functions
^^^^^^^^^^^^^^

Suppose that an optimization pass sees a function ``A`` and it
knows the values of (some of) its arguments.  The *jump
function* describes the value of a parameter of a given function
call in function ``A`` based on this knowledge.

Jump functions are used by several optimizations, such as the
inter-procedural constant propagation pass and the
devirtualization pass.  The inliner also uses jump functions to
perform inlining of callbacks.
Commit	Line	Data
c63539ff ML	1	..
	2	Copyright 1988-2022 Free Software Foundation, Inc.
	3	This is part of the GCC manual.
	4	For copying conditions, see the copyright.rst file.
	5
	6	.. _ipa:
	7
	8	Using summary information in IPA passes
	9	***************************************
	10
	11	Programs are represented internally as a callgraph (a
	12	multi-graph where nodes are functions and edges are call sites)
	13	and a varpool (a list of static and external variables in
	14	the program).
	15
	16	The inter-procedural optimization is organized as a sequence of
	17	individual passes, which operate on the callgraph and the
	18	varpool. To make the implementation of WHOPR possible, every
	19	inter-procedural optimization pass is split into several stages
	20	that are executed at different times during WHOPR compilation:
	21
	22	* LGEN time
	23
	24	* Generate summary (``generate_summary`` in
	25	``struct ipa_opt_pass_d``). This stage analyzes every function
	26	body and variable initializer is examined and stores relevant
	27	information into a pass-specific data structure.
	28
	29	* Write summary (``write_summary`` in
	30	``struct ipa_opt_pass_d``). This stage writes all the
	31	pass-specific information generated by ``generate_summary``.
	32	Summaries go into their own ``LTO_section_*`` sections that
	33	have to be declared in :samp:`lto-streamer.h`: ``enum
	34	lto_section_type``. A new section is created by calling
	35	``create_output_block`` and data can be written using the
	36	``lto_output_*`` routines.
	37
	38	* WPA time
	39
	40	* Read summary (``read_summary`` in
	41	``struct ipa_opt_pass_d``). This stage reads all the
	42	pass-specific information in exactly the same order that it was
	43	written by ``write_summary``.
	44
	45	* Execute (``execute`` in ``struct
	46	opt_pass``). This performs inter-procedural propagation. This
	47	must be done without actual access to the individual function
	48	bodies or variable initializers. Typically, this results in a
	49	transitive closure operation over the summary information of all
	50	the nodes in the callgraph.
	51
	52	* Write optimization summary
	53	(``write_optimization_summary`` in ``struct
	54	ipa_opt_pass_d``). This writes the result of the inter-procedural
	55	propagation into the object file. This can use the same data
	56	structures and helper routines used in ``write_summary``.
	57
	58	* LTRANS time
	59
	60	* Read optimization summary
	61	(``read_optimization_summary`` in ``struct
	62	ipa_opt_pass_d``). The counterpart to
	63	``write_optimization_summary``. This reads the interprocedural
	64	optimization decisions in exactly the same format emitted by
65	``write_optimization_summary``.
66
67	* Transform (``function_transform`` and
68	``variable_transform`` in ``struct ipa_opt_pass_d``).
69	The actual function bodies and variable initializers are updated
70	based on the information passed down from the Execute stage.
71
72	The implementation of the inter-procedural passes are shared
73	between LTO, WHOPR and classic non-LTO compilation.
74
75	* During the traditional file-by-file mode every pass executes its
76	own Generate summary, Execute, and Transform
77	stages within the single execution context of the compiler.
78
79	* In LTO compilation mode, every pass uses *Generate
80	summary* and Write summary stages at compilation time,
81	while the Read summary, Execute, and
82	Transform stages are executed at link time.
83
84	* In WHOPR mode all stages are used.
85
86	To simplify development, the GCC pass manager differentiates
87	between normal inter-procedural passes (see :ref:`regular-ipa-passes`),
88	small inter-procedural passes (see :ref:`small-ipa-passes`)
89	and late inter-procedural passes (see :ref:`late-ipa-passes`).
90	A small or late IPA pass (``SIMPLE_IPA_PASS``) does
91	everything at once and thus cannot be executed during WPA in
92	WHOPR mode. It defines only the Execute stage and during
93	this stage it accesses and modifies the function bodies. Such
94	passes are useful for optimization at LGEN or LTRANS time and are
95	used, for example, to implement early optimization before writing
96	object files. The simple inter-procedural passes can also be used
97	for easier prototyping and development of a new inter-procedural
98	pass.
99
100	Virtual clones
101	^^^^^^^^^^^^^^
102
103	One of the main challenges of introducing the WHOPR compilation
104	mode was addressing the interactions between optimization passes.
105	In LTO compilation mode, the passes are executed in a sequence,
106	each of which consists of analysis (or Generate summary),
107	propagation (or Execute) and Transform stages.
108	Once the work of one pass is finished, the next pass sees the
109	updated program representation and can execute. This makes the
110	individual passes dependent on each other.
111
112	In WHOPR mode all passes first execute their *Generate
113	summary* stage. Then summary writing marks the end of the LGEN
114	stage. At WPA time,
115	the summaries are read back into memory and all passes run the
116	Execute stage. Optimization summaries are streamed and
117	sent to LTRANS, where all the passes execute the Transform
118	stage.
119
120	Most optimization passes split naturally into analysis,
121	propagation and transformation stages. But some do not. The
122	main problem arises when one pass performs changes and the
123	following pass gets confused by seeing different callgraphs
124	between the Transform stage and the Generate summary
125	or Execute stage. This means that the passes are required
126	to communicate their decisions with each other.
127
128	To facilitate this communication, the GCC callgraph
129	infrastructure implements virtual clones, a method of
130	representing the changes performed by the optimization passes in
131	the callgraph without needing to update function bodies.
132
133	A virtual clone in the callgraph is a function that has no
134	associated body, just a description of how to create its body based
135	on a different function (which itself may be a virtual clone).
136
137	The description of function modifications includes adjustments to
138	the function's signature (which allows, for example, removing or
139	adding function arguments), substitutions to perform on the
140	function body, and, for inlined functions, a pointer to the
141	function that it will be inlined into.
142
143	It is also possible to redirect any edge of the callgraph from a
144	function to its virtual clone. This implies updating of the call
145	site to adjust for the new function signature.
146
147	Most of the transformations performed by inter-procedural
148	optimizations can be represented via virtual clones. For
149	instance, a constant propagation pass can produce a virtual clone
150	of the function which replaces one of its arguments by a
151	constant. The inliner can represent its decisions by producing a
152	clone of a function whose body will be later integrated into
153	a given function.
154
155	Using virtual clones, the program can be easily updated
156	during the Execute stage, solving most of pass interactions
157	problems that would otherwise occur during Transform.
158
159	Virtual clones are later materialized in the LTRANS stage and
160	turned into real functions. Passes executed after the virtual
161	clone were introduced also perform their Transform stage
162	on new functions, so for a pass there is no significant
163	difference between operating on a real function or a virtual
164	clone introduced before its Execute stage.
165
166	Optimization passes then work on virtual clones introduced before
167	their Execute stage as if they were real functions. The
168	only difference is that clones are not visible during the
169	Generate Summary stage.
170
171	To keep function summaries updated, the callgraph interface
172	allows an optimizer to register a callback that is called every
173	time a new clone is introduced as well as when the actual
174	function or variable is generated or when a function or variable
175	is removed. These hooks are registered in the *Generate
176	summary* stage and allow the pass to keep its information intact
177	until the Execute stage. The same hooks can also be
178	registered during the Execute stage to keep the
179	optimization summaries updated for the Transform stage.
180
181	IPA references
182	^^^^^^^^^^^^^^
183
184	GCC represents IPA references in the callgraph. For a function
185	or variable ``A``, the IPA reference is a list of all
186	locations where the address of ``A`` is taken and, when
187	``A`` is a variable, a list of all direct stores and reads
188	to/from ``A``. References represent an oriented multi-graph on
189	the union of nodes of the callgraph and the varpool. See
190	:samp:`ipa-reference.cc`: ``ipa_reference_write_optimization_summary``
191	and
192	:samp:`ipa-reference.cc`: ``ipa_reference_read_optimization_summary``
193	for details.
194
195	Jump functions
196	^^^^^^^^^^^^^^
197
198	Suppose that an optimization pass sees a function ``A`` and it
199	knows the values of (some of) its arguments. The *jump
200	function* describes the value of a parameter of a given function
201	call in function ``A`` based on this knowledge.
202
203	Jump functions are used by several optimizations, such as the
204	inter-procedural constant propagation pass and the
205	devirtualization pass. The inliner also uses jump functions to
3ed1b4ce	206	perform inlining of callbacks.