]> git.ipfire.org Git - thirdparty/gcc.git/blob - libgomp/doc/implementation-status-and-implementation-defined-behavior.rst
2c65c71c42dcdd797fc71fc200579d85e2049e02
[thirdparty/gcc.git] / libgomp / doc / implementation-status-and-implementation-defined-behavior.rst
1 ..
2 Copyright 1988-2022 Free Software Foundation, Inc.
3 This is part of the GCC manual.
4 For copying conditions, see the copyright.rst file.
5
6 Implementation Status and Implementation-Defined Behavior
7 *********************************************************
8
9 We're implementing the OpenACC Profiling Interface as defined by the
10 OpenACC 2.6 specification. We're clarifying some aspects here as
11 *implementation-defined behavior*, while they're still under
12 discussion within the OpenACC Technical Committee.
13
14 This implementation is tuned to keep the performance impact as low as
15 possible for the (very common) case that the Profiling Interface is
16 not enabled. This is relevant, as the Profiling Interface affects all
17 the *hot* code paths (in the target code, not in the offloaded
18 code). Users of the OpenACC Profiling Interface can be expected to
19 understand that performance will be impacted to some degree once the
20 Profiling Interface has gotten enabled: for example, because of the
21 *runtime* (libgomp) calling into a third-party *library* for
22 every event that has been registered.
23
24 We're not yet accounting for the fact that OpenACC events may
25 occur during event processing.
26 We just handle one case specially, as required by CUDA 9.0
27 :command:`nvprof`, that ``acc_get_device_type``
28 (:ref:`acc_get_device_type`)) may be called from
29 ``acc_ev_device_init_start``, ``acc_ev_device_init_end``
30 callbacks.
31
32 We're not yet implementing initialization via a
33 ``acc_register_library`` function that is either statically linked
34 in, or dynamically via :envvar:`LD_PRELOAD`.
35 Initialization via ``acc_register_library`` functions dynamically
36 loaded via the :envvar:`ACC_PROFLIB` environment variable does work, as
37 does directly calling ``acc_prof_register``,
38 ``acc_prof_unregister``, ``acc_prof_lookup``.
39
40 As currently there are no inquiry functions defined, calls to
41 ``acc_prof_lookup`` will always return ``NULL``.
42
43 There aren't separate *start*, *stop* events defined for the
44 event types ``acc_ev_create``, ``acc_ev_delete``,
45 ``acc_ev_alloc``, ``acc_ev_free``. It's not clear if these
46 should be triggered before or after the actual device-specific call is
47 made. We trigger them after.
48
49 Remarks about data provided to callbacks:
50
51 acc_prof_info.event_type
52 It's not clear if for *nested* event callbacks (for example,
53 ``acc_ev_enqueue_launch_start`` as part of a parent compute
54 construct), this should be set for the nested event
55 (``acc_ev_enqueue_launch_start``), or if the value of the parent
56 construct should remain (``acc_ev_compute_construct_start``). In
57 this implementation, the value will generally correspond to the
58 innermost nested event type.
59
60 acc_prof_info.device_type
61 * For ``acc_ev_compute_construct_start``, and in presence of an
62 ``if`` clause with *false* argument, this will still refer to
63 the offloading device type.
64 It's not clear if that's the expected behavior.
65
66 * Complementary to the item before, for
67 ``acc_ev_compute_construct_end``, this is set to
68 ``acc_device_host`` in presence of an ``if`` clause with
69 *false* argument.
70 It's not clear if that's the expected behavior.
71
72 acc_prof_info.thread_id
73 Always ``-1`` ; not yet implemented.
74
75 acc_prof_info.async
76 * Not yet implemented correctly for
77 ``acc_ev_compute_construct_start``.
78
79 * In a compute construct, for host-fallback
80 execution/ ``acc_device_host`` it will always be
81 ``acc_async_sync``.
82 It's not clear if that's the expected behavior.
83
84 * For ``acc_ev_device_init_start`` and ``acc_ev_device_init_end``,
85 it will always be ``acc_async_sync``.
86 It's not clear if that's the expected behavior.
87
88 acc_prof_info.async_queue
89 There is no limited number of asynchronous queues in libgomp.
90 This will always have the same value as ``acc_prof_info.async``.
91
92 acc_prof_info.src_file
93 Always ``NULL`` ; not yet implemented.
94
95 acc_prof_info.func_name
96 Always ``NULL`` ; not yet implemented.
97
98 acc_prof_info.line_no
99 Always ``-1`` ; not yet implemented.
100
101 acc_prof_info.end_line_no
102 Always ``-1`` ; not yet implemented.
103
104 acc_prof_info.func_line_no
105 Always ``-1`` ; not yet implemented.
106
107 acc_prof_info.func_end_line_no
108 Always ``-1`` ; not yet implemented.
109
110 acc_event_info.event_type, acc_event_info.*.event_type
111 Relating to ``acc_prof_info.event_type`` discussed above, in this
112 implementation, this will always be the same value as
113 ``acc_prof_info.event_type``.
114
115 acc_event_info.\*.parent_construct
116 * Will be ``acc_construct_parallel`` for all OpenACC compute
117 constructs as well as many OpenACC Runtime API calls; should be the
118 one matching the actual construct, or
119 ``acc_construct_runtime_api``, respectively.
120
121 * Will be ``acc_construct_enter_data`` or
122 ``acc_construct_exit_data`` when processing variable mappings
123 specified in OpenACC *declare* directives; should be
124 ``acc_construct_declare``.
125
126 * For implicit ``acc_ev_device_init_start``,
127 ``acc_ev_device_init_end``, and explicit as well as implicit
128 ``acc_ev_alloc``, ``acc_ev_free``,
129 ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``,
130 ``acc_ev_enqueue_download_start``, and
131 ``acc_ev_enqueue_download_end``, will be
132 ``acc_construct_parallel`` ; should reflect the real parent
133 construct.
134
135 acc_event_info.\*.implicit
136 For ``acc_ev_alloc``, ``acc_ev_free``,
137 ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``,
138 ``acc_ev_enqueue_download_start``, and
139 ``acc_ev_enqueue_download_end``, this currently will be ``1``
140 also for explicit usage.
141
142 acc_event_info.data_event.var_name
143 Always ``NULL`` ; not yet implemented.
144
145 acc_event_info.data_event.host_ptr
146 For ``acc_ev_alloc``, and ``acc_ev_free``, this is always
147 ``NULL``.
148
149 typedef union acc_api_info
150 ... as printed in 5.2.3. Third Argument: API-Specific
151 Information. This should obviously be ``typedef struct
152 acc_api_info``.
153
154 acc_api_info.device_api
155 Possibly not yet implemented correctly for
156 ``acc_ev_compute_construct_start``,
157 ``acc_ev_device_init_start``, ``acc_ev_device_init_end`` :
158 will always be ``acc_device_api_none`` for these event types.
159 For ``acc_ev_enter_data_start``, it will be
160 ``acc_device_api_none`` in some cases.
161
162 acc_api_info.device_type
163 Always the same as ``acc_prof_info.device_type``.
164
165 acc_api_info.vendor
166 Always ``-1`` ; not yet implemented.
167
168 acc_api_info.device_handle
169 Always ``NULL`` ; not yet implemented.
170
171 acc_api_info.context_handle
172 Always ``NULL`` ; not yet implemented.
173
174 acc_api_info.async_handle
175 Always ``NULL`` ; not yet implemented.
176
177 Remarks about certain event types:
178
179 acc_ev_device_init_start, acc_ev_device_init_end
180 *
181 .. See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
182 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
183 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
184
185 When a compute construct triggers implicit
186 ``acc_ev_device_init_start`` and ``acc_ev_device_init_end``
187 events, they currently aren't *nested within* the corresponding
188 ``acc_ev_compute_construct_start`` and
189 ``acc_ev_compute_construct_end``, but they're currently observed
190 *before* ``acc_ev_compute_construct_start``.
191 It's not clear what to do: the standard asks us provide a lot of
192 details to the ``acc_ev_compute_construct_start`` callback, without
193 (implicitly) initializing a device before?
194
195 * Callbacks for these event types will not be invoked for calls to the
196 ``acc_set_device_type`` and ``acc_set_device_num`` functions.
197 It's not clear if they should be.
198
199 acc_ev_enter_data_start, acc_ev_enter_data_end, acc_ev_exit_data_start, acc_ev_exit_data_end
200 * Callbacks for these event types will also be invoked for OpenACC
201 *host_data* constructs.
202 It's not clear if they should be.
203
204 * Callbacks for these event types will also be invoked when processing
205 variable mappings specified in OpenACC *declare* directives.
206 It's not clear if they should be.
207
208 Callbacks for the following event types will be invoked, but dispatch
209 and information provided therein has not yet been thoroughly reviewed:
210
211 * ``acc_ev_alloc``
212
213 * ``acc_ev_free``
214
215 * ``acc_ev_update_start``, ``acc_ev_update_end``
216
217 * ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``
218
219 * ``acc_ev_enqueue_download_start``, ``acc_ev_enqueue_download_end``
220
221 During device initialization, and finalization, respectively,
222 callbacks for the following event types will not yet be invoked:
223
224 * ``acc_ev_alloc``
225
226 * ``acc_ev_free``
227
228 Callbacks for the following event types have not yet been implemented,
229 so currently won't be invoked:
230
231 * ``acc_ev_device_shutdown_start``, ``acc_ev_device_shutdown_end``
232
233 * ``acc_ev_runtime_shutdown``
234
235 * ``acc_ev_create``, ``acc_ev_delete``
236
237 * ``acc_ev_wait_start``, ``acc_ev_wait_end``
238
239 For the following runtime library functions, not all expected
240 callbacks will be invoked (mostly concerning implicit device
241 initialization):
242
243 * ``acc_get_num_devices``
244
245 * ``acc_set_device_type``
246
247 * ``acc_get_device_type``
248
249 * ``acc_set_device_num``
250
251 * ``acc_get_device_num``
252
253 * ``acc_init``
254
255 * ``acc_shutdown``
256
257 Aside from implicit device initialization, for the following runtime
258 library functions, no callbacks will be invoked for shared-memory
259 offloading devices (it's not clear if they should be):
260
261 * ``acc_malloc``
262
263 * ``acc_free``
264
265 * ``acc_copyin``, ``acc_present_or_copyin``, ``acc_copyin_async``
266
267 * ``acc_create``, ``acc_present_or_create``, ``acc_create_async``
268
269 * ``acc_copyout``, ``acc_copyout_async``, ``acc_copyout_finalize``, ``acc_copyout_finalize_async``
270
271 * ``acc_delete``, ``acc_delete_async``, ``acc_delete_finalize``, ``acc_delete_finalize_async``
272
273 * ``acc_update_device``, ``acc_update_device_async``
274
275 * ``acc_update_self``, ``acc_update_self_async``
276
277 * ``acc_map_data``, ``acc_unmap_data``
278
279 * ``acc_memcpy_to_device``, ``acc_memcpy_to_device_async``
280
281 * ``acc_memcpy_from_device``, ``acc_memcpy_from_device_async``