[thirdparty/gcc.git] / gcc / doc / gccint / analysis-and-representation-of-loops / data-dependency-analysis.rst

..
  Copyright 1988-2022 Free Software Foundation, Inc.
  This is part of the GCC manual.
  For copying conditions, see the copyright.rst file.

.. index:: Data Dependency Analysis

.. _dependency-analysis:

Data Dependency Analysis
************************

The code for the data dependence analysis can be found in
:samp:`tree-data-ref.cc` and its interface and data structures are
described in :samp:`tree-data-ref.h`.  The function that computes the
data dependences for all the array and pointer references for a given
loop is ``compute_data_dependences_for_loop``.  This function is
currently used by the linear loop transform and the vectorization
passes.  Before calling this function, one has to allocate two vectors:
a first vector will contain the set of data references that are
contained in the analyzed loop body, and the second vector will contain
the dependence relations between the data references.  Thus if the
vector of data references is of size ``n``, the vector containing the
dependence relations will contain ``n*n`` elements.  However if the
analyzed loop contains side effects, such as calls that potentially can
interfere with the data references in the current analyzed loop, the
analysis stops while scanning the loop body for data references, and
inserts a single ``chrec_dont_know`` in the dependence relation
array.

The data references are discovered in a particular order during the
scanning of the loop body: the loop body is analyzed in execution order,
and the data references of each statement are pushed at the end of the
data reference array.  Two data references syntactically occur in the
program in the same order as in the array of data references.  This
syntactic order is important in some classical data dependence tests,
and mapping this order to the elements of this array avoids costly
queries to the loop body representation.

Three types of data references are currently handled: ARRAY_REF,
INDIRECT_REF and COMPONENT_REF. The data structure for the data reference
is ``data_reference``, where ``data_reference_p`` is a name of a
pointer to the data reference structure. The structure contains the
following elements:

* ``base_object_info`` : Provides information about the base object
  of the data reference and its access functions. These access functions
  represent the evolution of the data reference in the loop relative to
  its base, in keeping with the classical meaning of the data reference
  access function for the support of arrays. For example, for a reference
  ``a.b[i][j]``, the base object is ``a.b`` and the access functions,
  one for each array subscript, are:
  ``{i_init, + i_step}_1, {j_init, +, j_step}_2``.

* ``first_location_in_loop`` : Provides information about the first
  location accessed by the data reference in the loop and about the access
  function used to represent evolution relative to this location. This data
  is used to support pointers, and is not used for arrays (for which we
  have base objects). Pointer accesses are represented as a one-dimensional
  access that starts from the first location accessed in the loop. For
  example:

  .. code-block:: c++

          for1 i
             for2 j
              *((int *)p + i + j) = a[i][j];

  The access function of the pointer access is ``{0, + 4B}_for2``
  relative to ``p + i``. The access functions of the array are
  ``{i_init, + i_step}_for1`` and ``{j_init, +, j_step}_for2``
  relative to ``a``.

  Usually, the object the pointer refers to is either unknown, or we cannot
  prove that the access is confined to the boundaries of a certain object.

  Two data references can be compared only if at least one of these two
  representations has all its fields filled for both data references.

  The current strategy for data dependence tests is as follows:
  If both ``a`` and ``b`` are represented as arrays, compare
  ``a.base_object`` and ``b.base_object`` ;
  if they are equal, apply dependence tests (use access functions based on
  base_objects).
  Else if both ``a`` and ``b`` are represented as pointers, compare
  ``a.first_location`` and ``b.first_location`` ;
  if they are equal, apply dependence tests (use access functions based on
  first location).
  However, if ``a`` and ``b`` are represented differently, only try
  to prove that the bases are definitely different.

* Aliasing information.

* Alignment information.

The structure describing the relation between two data references is
``data_dependence_relation`` and the shorter name for a pointer to
such a structure is ``ddr_p``.  This structure contains:

* a pointer to each data reference,

* a tree node ``are_dependent`` that is set to ``chrec_known``
  if the analysis has proved that there is no dependence between these two
  data references, ``chrec_dont_know`` if the analysis was not able to
  determine any useful result and potentially there could exist a
  dependence between these data references, and ``are_dependent`` is
  set to ``NULL_TREE`` if there exist a dependence relation between the
  data references, and the description of this dependence relation is
  given in the ``subscripts``, ``dir_vects``, and ``dist_vects``
  arrays,

* a boolean that determines whether the dependence relation can be
  represented by a classical distance vector,

* an array ``subscripts`` that contains a description of each
  subscript of the data references.  Given two array accesses a
  subscript is the tuple composed of the access functions for a given
  dimension.  For example, given ``A[f1][f2][f3]`` and
  ``B[g1][g2][g3]``, there are three subscripts: ``(f1, g1), (f2,
  g2), (f3, g3)``.

* two arrays ``dir_vects`` and ``dist_vects`` that contain
  classical representations of the data dependences under the form of
  direction and distance dependence vectors,

* an array of loops ``loop_nest`` that contains the loops to
  which the distance and direction vectors refer to.

Several functions for pretty printing the information extracted by the
data dependence analysis are available: ``dump_ddrs`` prints with a
maximum verbosity the details of a data dependence relations array,
``dump_dist_dir_vectors`` prints only the classical distance and
direction vectors for a data dependence relations array, and
``dump_data_references`` prints the details of the data references
contained in a data reference array.
Commit	Line	Data
c63539ff ML	1	..
	2	Copyright 1988-2022 Free Software Foundation, Inc.
	3	This is part of the GCC manual.
	4	For copying conditions, see the copyright.rst file.
	5
	6	.. index:: Data Dependency Analysis
	7
	8	.. _dependency-analysis:
	9
	10	Data Dependency Analysis
	11	************************
	12
	13	The code for the data dependence analysis can be found in
	14	:samp:`tree-data-ref.cc` and its interface and data structures are
	15	described in :samp:`tree-data-ref.h`. The function that computes the
	16	data dependences for all the array and pointer references for a given
	17	loop is ``compute_data_dependences_for_loop``. This function is
	18	currently used by the linear loop transform and the vectorization
	19	passes. Before calling this function, one has to allocate two vectors:
	20	a first vector will contain the set of data references that are
	21	contained in the analyzed loop body, and the second vector will contain
	22	the dependence relations between the data references. Thus if the
	23	vector of data references is of size ``n``, the vector containing the
	24	dependence relations will contain ``n*n`` elements. However if the
	25	analyzed loop contains side effects, such as calls that potentially can
	26	interfere with the data references in the current analyzed loop, the
	27	analysis stops while scanning the loop body for data references, and
	28	inserts a single ``chrec_dont_know`` in the dependence relation
	29	array.
	30
	31	The data references are discovered in a particular order during the
	32	scanning of the loop body: the loop body is analyzed in execution order,
	33	and the data references of each statement are pushed at the end of the
	34	data reference array. Two data references syntactically occur in the
	35	program in the same order as in the array of data references. This
	36	syntactic order is important in some classical data dependence tests,
	37	and mapping this order to the elements of this array avoids costly
	38	queries to the loop body representation.
	39
	40	Three types of data references are currently handled: ARRAY_REF,
	41	INDIRECT_REF and COMPONENT_REF. The data structure for the data reference
	42	is ``data_reference``, where ``data_reference_p`` is a name of a
	43	pointer to the data reference structure. The structure contains the
	44	following elements:
	45
	46	* ``base_object_info`` : Provides information about the base object
	47	of the data reference and its access functions. These access functions
	48	represent the evolution of the data reference in the loop relative to
	49	its base, in keeping with the classical meaning of the data reference
	50	access function for the support of arrays. For example, for a reference
	51	``a.b[i][j]``, the base object is ``a.b`` and the access functions,
	52	one for each array subscript, are:
	53	``{i_init, + i_step}_1, {j_init, +, j_step}_2``.
	54
	55	* ``first_location_in_loop`` : Provides information about the first
	56	location accessed by the data reference in the loop and about the access
	57	function used to represent evolution relative to this location. This data
	58	is used to support pointers, and is not used for arrays (for which we
	59	have base objects). Pointer accesses are represented as a one-dimensional
	60	access that starts from the first location accessed in the loop. For
	61	example:
	62
	63	.. code-block:: c++
	64
65	for1 i
66	for2 j
67	((int )p + i + j) = a[i][j];
68
69	The access function of the pointer access is ``{0, + 4B}_for2``
70	relative to ``p + i``. The access functions of the array are
71	``{i_init, + i_step}_for1`` and ``{j_init, +, j_step}_for2``
72	relative to ``a``.
73
74	Usually, the object the pointer refers to is either unknown, or we cannot
75	prove that the access is confined to the boundaries of a certain object.
76
77	Two data references can be compared only if at least one of these two
78	representations has all its fields filled for both data references.
79
80	The current strategy for data dependence tests is as follows:
81	If both ``a`` and ``b`` are represented as arrays, compare
82	``a.base_object`` and ``b.base_object`` ;
83	if they are equal, apply dependence tests (use access functions based on
84	base_objects).
85	Else if both ``a`` and ``b`` are represented as pointers, compare
86	``a.first_location`` and ``b.first_location`` ;
87	if they are equal, apply dependence tests (use access functions based on
88	first location).
89	However, if ``a`` and ``b`` are represented differently, only try
90	to prove that the bases are definitely different.
91
92	* Aliasing information.
93
94	* Alignment information.
95
96	The structure describing the relation between two data references is
97	``data_dependence_relation`` and the shorter name for a pointer to
98	such a structure is ``ddr_p``. This structure contains:
99
100	* a pointer to each data reference,
101
102	* a tree node ``are_dependent`` that is set to ``chrec_known``
103	if the analysis has proved that there is no dependence between these two
104	data references, ``chrec_dont_know`` if the analysis was not able to
105	determine any useful result and potentially there could exist a
106	dependence between these data references, and ``are_dependent`` is
107	set to ``NULL_TREE`` if there exist a dependence relation between the
108	data references, and the description of this dependence relation is
109	given in the ``subscripts``, ``dir_vects``, and ``dist_vects``
110	arrays,
111
112	* a boolean that determines whether the dependence relation can be
113	represented by a classical distance vector,
114
115	* an array ``subscripts`` that contains a description of each
116	subscript of the data references. Given two array accesses a
117	subscript is the tuple composed of the access functions for a given
118	dimension. For example, given ``A[f1][f2][f3]`` and
119	``B[g1][g2][g3]``, there are three subscripts: ``(f1, g1), (f2,
120	g2), (f3, g3)``.
121
122	* two arrays ``dir_vects`` and ``dist_vects`` that contain
123	classical representations of the data dependences under the form of
124	direction and distance dependence vectors,
125
126	* an array of loops ``loop_nest`` that contains the loops to
127	which the distance and direction vectors refer to.
128
129	Several functions for pretty printing the information extracted by the
130	data dependence analysis are available: ``dump_ddrs`` prints with a
131	maximum verbosity the details of a data dependence relations array,
132	``dump_dist_dir_vectors`` prints only the classical distance and
133	direction vectors for a data dependence relations array, and
134	``dump_data_references`` prints the details of the data references
3ed1b4ce	135	contained in a data reference array.