[thirdparty/gcc.git] / gcc / doc / gccint / generic / overview.rst

..
  Copyright 1988-2022 Free Software Foundation, Inc.
  This is part of the GCC manual.
  For copying conditions, see the copyright.rst file.

.. index:: tree, TREE_CODE

.. _tree-overview:

Overview
********

The central data structure used by the internal representation is the
``tree``.  These nodes, while all of the C type ``tree``, are of
many varieties.  A ``tree`` is a pointer type, but the object to
which it points may be of a variety of types.  From this point forward,
we will refer to trees in ordinary type, rather than in ``this
font``, except when talking about the actual C type ``tree``.

You can tell what kind of node a particular tree is by using the
``TREE_CODE`` macro.  Many, many macros take trees as input and
return trees as output.  However, most macros require a certain kind of
tree node as input.  In other words, there is a type-system for trees,
but it is not reflected in the C type-system.

For safety, it is useful to configure GCC with :option:`--enable-checking`.
Although this results in a significant performance penalty (since all
tree types are checked at run-time), and is therefore inappropriate in a
release version, it is extremely helpful during the development process.

Many macros behave as predicates.  Many, although not all, of these
predicates end in :samp:`_P`.  Do not rely on the result type of these
macros being of any particular type.  You may, however, rely on the fact
that the type can be compared to ``0``, so that statements like

.. code-block:: c++

  if (TEST_P (t) && !TEST_P (y))
    x = 1;

and

.. code-block:: c++

  int i = (TEST_P (t) != 0);

are legal.  Macros that return ``int`` values now may be changed to
return ``tree`` values, or other pointers in the future.  Even those
that continue to return ``int`` may return multiple nonzero codes
where previously they returned only zero and one.  Therefore, you should
not write code like

.. code-block:: c++

  if (TEST_P (t) == 1)

as this code is not guaranteed to work correctly in the future.

You should not take the address of values returned by the macros or
functions described here.  In particular, no guarantee is given that the
values are lvalues.

In general, the names of macros are all in uppercase, while the names of
functions are entirely in lowercase.  There are rare exceptions to this
rule.  You should assume that any macro or function whose name is made
up entirely of uppercase letters may evaluate its arguments more than
once.  You may assume that a macro or function whose name is made up
entirely of lowercase letters will evaluate its arguments only once.

The ``error_mark_node`` is a special tree.  Its tree code is
``ERROR_MARK``, but since there is only ever one node with that code,
the usual practice is to compare the tree against
``error_mark_node``.  (This test is just a test for pointer
equality.)  If an error has occurred during front-end processing the
flag ``errorcount`` will be set.  If the front end has encountered
code it cannot handle, it will issue a message to the user and set
``sorrycount``.  When these flags are set, any macro or function
which normally returns a tree of a particular kind may instead return
the ``error_mark_node``.  Thus, if you intend to do any processing of
erroneous code, you must be prepared to deal with the
``error_mark_node``.

Occasionally, a particular tree slot (like an operand to an expression,
or a particular field in a declaration) will be referred to as
'reserved for the back end'.  These slots are used to store RTL when
the tree is converted to RTL for use by the GCC back end.  However, if
that process is not taking place (e.g., if the front end is being hooked
up to an intelligent editor), then those slots may be used by the
back end presently in use.

If you encounter situations that do not match this documentation, such
as tree nodes of types not mentioned here, or macros documented to
return entities of a particular kind that instead return entities of
some different kind, you have found a bug, either in the front end or in
the documentation.  Please report these bugs as you would any other
bug.

.. toctree::
  :maxdepth: 2


.. -
   Trees
   -

.. index:: tree, TREE_CHAIN, TREE_TYPE

.. _macros-and-functions:

Trees
^^^^^

All GENERIC trees have two fields in common.  First, ``TREE_CHAIN``
is a pointer that can be used as a singly-linked list to other trees.
The other is ``TREE_TYPE``.  Many trees store the type of an
expression or declaration in this field.

These are some other functions for handling trees:

``tree_size``
  Return the number of bytes a tree takes.

``build0``, ``build1``, ``build2``, ``build3``, ``build4``, ``build5``, ``build6``
  These functions build a tree and supply values to put in each
  parameter.  The basic signature is :samp:`code, type, [operands]`.
  ``code`` is the ``TREE_CODE``, and ``type`` is a tree
  representing the ``TREE_TYPE``.  These are followed by the
  operands, each of which is also a tree.

.. -
   Identifiers
   -

.. index:: identifier, name

.. _identifiers:

Identifiers
^^^^^^^^^^^

.. index:: IDENTIFIER_NODE

An ``IDENTIFIER_NODE`` represents a slightly more general concept
than the standard C or C++ concept of identifier.  In particular, an
``IDENTIFIER_NODE`` may contain a :samp:`$`, or other extraordinary
characters.

There are never two distinct ``IDENTIFIER_NODE`` s representing the
same identifier.  Therefore, you may use pointer equality to compare
``IDENTIFIER_NODE`` s, rather than using a routine like
``strcmp``.  Use ``get_identifier`` to obtain the unique
``IDENTIFIER_NODE`` for a supplied string.

You can use the following macros to access identifiers:

.. envvar:: IDENTIFIER_POINTER

  The string represented by the identifier, represented as a
  ``char*``.  This string is always ``NUL`` -terminated, and contains
  no embedded ``NUL`` characters.

.. envvar:: IDENTIFIER_LENGTH

  The length of the string returned by ``IDENTIFIER_POINTER``, not
  including the trailing ``NUL``.  This value of
  ``IDENTIFIER_LENGTH (x)`` is always the same as ``strlen
  (IDENTIFIER_POINTER (x))``.

.. envvar:: IDENTIFIER_OPNAME_P

  This predicate holds if the identifier represents the name of an
  overloaded operator.  In this case, you should not depend on the
  contents of either the ``IDENTIFIER_POINTER`` or the
  ``IDENTIFIER_LENGTH``.

.. envvar:: IDENTIFIER_TYPENAME_P

  This predicate holds if the identifier represents the name of a
  user-defined conversion operator.  In this case, the ``TREE_TYPE`` of
  the ``IDENTIFIER_NODE`` holds the type to which the conversion
  operator converts.

.. -
   Containers
   -

.. index:: container, list, vector

.. _containers:

Containers
^^^^^^^^^^

.. index:: TREE_LIST, TREE_VEC, TREE_PURPOSE, TREE_VALUE, TREE_VEC_LENGTH, TREE_VEC_ELT

Two common container data structures can be represented directly with
tree nodes.  A ``TREE_LIST`` is a singly linked list containing two
trees per node.  These are the ``TREE_PURPOSE`` and ``TREE_VALUE``
of each node.  (Often, the ``TREE_PURPOSE`` contains some kind of
tag, or additional information, while the ``TREE_VALUE`` contains the
majority of the payload.  In other cases, the ``TREE_PURPOSE`` is
simply ``NULL_TREE``, while in still others both the
``TREE_PURPOSE`` and ``TREE_VALUE`` are of equal stature.)  Given
one ``TREE_LIST`` node, the next node is found by following the
``TREE_CHAIN``.  If the ``TREE_CHAIN`` is ``NULL_TREE``, then
you have reached the end of the list.

A ``TREE_VEC`` is a simple vector.  The ``TREE_VEC_LENGTH`` is an
integer (not a tree) giving the number of nodes in the vector.  The
nodes themselves are accessed using the ``TREE_VEC_ELT`` macro, which
takes two arguments.  The first is the ``TREE_VEC`` in question; the
second is an integer indicating which element in the vector is desired.
The elements are indexed from zero.
Commit	Line	Data
c63539ff ML	1	..
	2	Copyright 1988-2022 Free Software Foundation, Inc.
	3	This is part of the GCC manual.
	4	For copying conditions, see the copyright.rst file.
	5
	6	.. index:: tree, TREE_CODE
	7
	8	.. _tree-overview:
	9
	10	Overview
	11	********
	12
	13	The central data structure used by the internal representation is the
	14	``tree``. These nodes, while all of the C type ``tree``, are of
	15	many varieties. A ``tree`` is a pointer type, but the object to
	16	which it points may be of a variety of types. From this point forward,
	17	we will refer to trees in ordinary type, rather than in ``this
	18	font``, except when talking about the actual C type ``tree``.
	19
	20	You can tell what kind of node a particular tree is by using the
	21	``TREE_CODE`` macro. Many, many macros take trees as input and
	22	return trees as output. However, most macros require a certain kind of
	23	tree node as input. In other words, there is a type-system for trees,
	24	but it is not reflected in the C type-system.
	25
	26	For safety, it is useful to configure GCC with :option:`--enable-checking`.
	27	Although this results in a significant performance penalty (since all
	28	tree types are checked at run-time), and is therefore inappropriate in a
	29	release version, it is extremely helpful during the development process.
	30
	31	Many macros behave as predicates. Many, although not all, of these
	32	predicates end in :samp:`_P`. Do not rely on the result type of these
	33	macros being of any particular type. You may, however, rely on the fact
	34	that the type can be compared to ``0``, so that statements like
	35
	36	.. code-block:: c++
	37
	38	if (TEST_P (t) && !TEST_P (y))
	39	x = 1;
	40
	41	and
	42
	43	.. code-block:: c++
	44
	45	int i = (TEST_P (t) != 0);
	46
	47	are legal. Macros that return ``int`` values now may be changed to
	48	return ``tree`` values, or other pointers in the future. Even those
	49	that continue to return ``int`` may return multiple nonzero codes
	50	where previously they returned only zero and one. Therefore, you should
	51	not write code like
	52
	53	.. code-block:: c++
	54
	55	if (TEST_P (t) == 1)
	56
	57	as this code is not guaranteed to work correctly in the future.
	58
	59	You should not take the address of values returned by the macros or
	60	functions described here. In particular, no guarantee is given that the
	61	values are lvalues.
	62
	63	In general, the names of macros are all in uppercase, while the names of
	64	functions are entirely in lowercase. There are rare exceptions to this
65	rule. You should assume that any macro or function whose name is made
66	up entirely of uppercase letters may evaluate its arguments more than
67	once. You may assume that a macro or function whose name is made up
68	entirely of lowercase letters will evaluate its arguments only once.
69
70	The ``error_mark_node`` is a special tree. Its tree code is
71	``ERROR_MARK``, but since there is only ever one node with that code,
72	the usual practice is to compare the tree against
73	``error_mark_node``. (This test is just a test for pointer
74	equality.) If an error has occurred during front-end processing the
75	flag ``errorcount`` will be set. If the front end has encountered
76	code it cannot handle, it will issue a message to the user and set
77	``sorrycount``. When these flags are set, any macro or function
78	which normally returns a tree of a particular kind may instead return
79	the ``error_mark_node``. Thus, if you intend to do any processing of
80	erroneous code, you must be prepared to deal with the
81	``error_mark_node``.
82
83	Occasionally, a particular tree slot (like an operand to an expression,
84	or a particular field in a declaration) will be referred to as
85	'reserved for the back end'. These slots are used to store RTL when
86	the tree is converted to RTL for use by the GCC back end. However, if
87	that process is not taking place (e.g., if the front end is being hooked
88	up to an intelligent editor), then those slots may be used by the
89	back end presently in use.
90
91	If you encounter situations that do not match this documentation, such
92	as tree nodes of types not mentioned here, or macros documented to
93	return entities of a particular kind that instead return entities of
94	some different kind, you have found a bug, either in the front end or in
95	the documentation. Please report these bugs as you would any other
96	bug.
97
98	.. toctree::
99	:maxdepth: 2
100
101
102	.. -
103	Trees
104	-
105
106	.. index:: tree, TREE_CHAIN, TREE_TYPE
107
108	.. _macros-and-functions:
109
110	Trees
111	^^^^^
112
113	All GENERIC trees have two fields in common. First, ``TREE_CHAIN``
114	is a pointer that can be used as a singly-linked list to other trees.
115	The other is ``TREE_TYPE``. Many trees store the type of an
116	expression or declaration in this field.
117
118	These are some other functions for handling trees:
119
120	``tree_size``
121	Return the number of bytes a tree takes.
122
123	``build0``, ``build1``, ``build2``, ``build3``, ``build4``, ``build5``, ``build6``
124	These functions build a tree and supply values to put in each
125	parameter. The basic signature is :samp:`code, type, [operands]`.
126	``code`` is the ``TREE_CODE``, and ``type`` is a tree
127	representing the ``TREE_TYPE``. These are followed by the
128	operands, each of which is also a tree.
129
130	.. -
131	Identifiers
132	-
133
134	.. index:: identifier, name
135
136	.. _identifiers:
137
138	Identifiers
139	^^^^^^^^^^^
140
141	.. index:: IDENTIFIER_NODE
142
143	An ``IDENTIFIER_NODE`` represents a slightly more general concept
144	than the standard C or C++ concept of identifier. In particular, an
145	``IDENTIFIER_NODE`` may contain a :samp:`$`, or other extraordinary
146	characters.
147
148	There are never two distinct ``IDENTIFIER_NODE`` s representing the
149	same identifier. Therefore, you may use pointer equality to compare
150	``IDENTIFIER_NODE`` s, rather than using a routine like
151	``strcmp``. Use ``get_identifier`` to obtain the unique
152	``IDENTIFIER_NODE`` for a supplied string.
153
154	You can use the following macros to access identifiers:
155
156	.. envvar:: IDENTIFIER_POINTER
157
158	The string represented by the identifier, represented as a
159	``char*``. This string is always ``NUL`` -terminated, and contains
160	no embedded ``NUL`` characters.
161
162	.. envvar:: IDENTIFIER_LENGTH
163
164	The length of the string returned by ``IDENTIFIER_POINTER``, not
165	including the trailing ``NUL``. This value of
166	``IDENTIFIER_LENGTH (x)`` is always the same as ``strlen
167	(IDENTIFIER_POINTER (x))``.
168
169	.. envvar:: IDENTIFIER_OPNAME_P
170
171	This predicate holds if the identifier represents the name of an
172	overloaded operator. In this case, you should not depend on the
173	contents of either the ``IDENTIFIER_POINTER`` or the
174	``IDENTIFIER_LENGTH``.
175
176	.. envvar:: IDENTIFIER_TYPENAME_P
177
178	This predicate holds if the identifier represents the name of a
179	user-defined conversion operator. In this case, the ``TREE_TYPE`` of
180	the ``IDENTIFIER_NODE`` holds the type to which the conversion
181	operator converts.
182
183	.. -
184	Containers
185	-
186
187	.. index:: container, list, vector
188
189	.. _containers:
190
191	Containers
192	^^^^^^^^^^
193
194	.. index:: TREE_LIST, TREE_VEC, TREE_PURPOSE, TREE_VALUE, TREE_VEC_LENGTH, TREE_VEC_ELT
195
196	Two common container data structures can be represented directly with
197	tree nodes. A ``TREE_LIST`` is a singly linked list containing two
198	trees per node. These are the ``TREE_PURPOSE`` and ``TREE_VALUE``
199	of each node. (Often, the ``TREE_PURPOSE`` contains some kind of
200	tag, or additional information, while the ``TREE_VALUE`` contains the
201	majority of the payload. In other cases, the ``TREE_PURPOSE`` is
202	simply ``NULL_TREE``, while in still others both the
203	``TREE_PURPOSE`` and ``TREE_VALUE`` are of equal stature.) Given
204	one ``TREE_LIST`` node, the next node is found by following the
205	``TREE_CHAIN``. If the ``TREE_CHAIN`` is ``NULL_TREE``, then
206	you have reached the end of the list.
207
208	A ``TREE_VEC`` is a simple vector. The ``TREE_VEC_LENGTH`` is an
209	integer (not a tree) giving the number of nodes in the vector. The
210	nodes themselves are accessed using the ``TREE_VEC_ELT`` macro, which
211	takes two arguments. The first is the ``TREE_VEC`` in question; the
212	second is an integer indicating which element in the vector is desired.
3ed1b4ce	213	The elements are indexed from zero.