2 Copyright 1988-2022 Free Software Foundation, Inc.
3 This is part of the GCC manual.
4 For copying conditions, see the copyright.rst file.
6 Implementation Status and Implementation-Defined Behavior
7 *********************************************************
9 We're implementing the OpenACC Profiling Interface as defined by the
10 OpenACC 2.6 specification. We're clarifying some aspects here as
11 *implementation-defined behavior*, while they're still under
12 discussion within the OpenACC Technical Committee.
14 This implementation is tuned to keep the performance impact as low as
15 possible for the (very common) case that the Profiling Interface is
16 not enabled. This is relevant, as the Profiling Interface affects all
17 the *hot* code paths (in the target code, not in the offloaded
18 code). Users of the OpenACC Profiling Interface can be expected to
19 understand that performance will be impacted to some degree once the
20 Profiling Interface has gotten enabled: for example, because of the
21 *runtime* (libgomp) calling into a third-party *library* for
22 every event that has been registered.
24 We're not yet accounting for the fact that OpenACC events may
25 occur during event processing.
26 We just handle one case specially, as required by CUDA 9.0
27 :command:`nvprof`, that ``acc_get_device_type``
28 (:ref:`acc_get_device_type`)) may be called from
29 ``acc_ev_device_init_start``, ``acc_ev_device_init_end``
32 We're not yet implementing initialization via a
33 ``acc_register_library`` function that is either statically linked
34 in, or dynamically via :envvar:`LD_PRELOAD`.
35 Initialization via ``acc_register_library`` functions dynamically
36 loaded via the :envvar:`ACC_PROFLIB` environment variable does work, as
37 does directly calling ``acc_prof_register``,
38 ``acc_prof_unregister``, ``acc_prof_lookup``.
40 As currently there are no inquiry functions defined, calls to
41 ``acc_prof_lookup`` will always return ``NULL``.
43 There aren't separate *start*, *stop* events defined for the
44 event types ``acc_ev_create``, ``acc_ev_delete``,
45 ``acc_ev_alloc``, ``acc_ev_free``. It's not clear if these
46 should be triggered before or after the actual device-specific call is
47 made. We trigger them after.
49 Remarks about data provided to callbacks:
51 acc_prof_info.event_type
52 It's not clear if for *nested* event callbacks (for example,
53 ``acc_ev_enqueue_launch_start`` as part of a parent compute
54 construct), this should be set for the nested event
55 (``acc_ev_enqueue_launch_start``), or if the value of the parent
56 construct should remain (``acc_ev_compute_construct_start``). In
57 this implementation, the value will generally correspond to the
58 innermost nested event type.
60 acc_prof_info.device_type
61 * For ``acc_ev_compute_construct_start``, and in presence of an
62 ``if`` clause with *false* argument, this will still refer to
63 the offloading device type.
64 It's not clear if that's the expected behavior.
66 * Complementary to the item before, for
67 ``acc_ev_compute_construct_end``, this is set to
68 ``acc_device_host`` in presence of an ``if`` clause with
70 It's not clear if that's the expected behavior.
72 acc_prof_info.thread_id
73 Always ``-1`` ; not yet implemented.
76 * Not yet implemented correctly for
77 ``acc_ev_compute_construct_start``.
79 * In a compute construct, for host-fallback
80 execution/ ``acc_device_host`` it will always be
82 It's not clear if that's the expected behavior.
84 * For ``acc_ev_device_init_start`` and ``acc_ev_device_init_end``,
85 it will always be ``acc_async_sync``.
86 It's not clear if that's the expected behavior.
88 acc_prof_info.async_queue
89 There is no limited number of asynchronous queues in libgomp.
90 This will always have the same value as ``acc_prof_info.async``.
92 acc_prof_info.src_file
93 Always ``NULL`` ; not yet implemented.
95 acc_prof_info.func_name
96 Always ``NULL`` ; not yet implemented.
99 Always ``-1`` ; not yet implemented.
101 acc_prof_info.end_line_no
102 Always ``-1`` ; not yet implemented.
104 acc_prof_info.func_line_no
105 Always ``-1`` ; not yet implemented.
107 acc_prof_info.func_end_line_no
108 Always ``-1`` ; not yet implemented.
110 acc_event_info.event_type, acc_event_info.*.event_type
111 Relating to ``acc_prof_info.event_type`` discussed above, in this
112 implementation, this will always be the same value as
113 ``acc_prof_info.event_type``.
115 acc_event_info.\*.parent_construct
116 * Will be ``acc_construct_parallel`` for all OpenACC compute
117 constructs as well as many OpenACC Runtime API calls; should be the
118 one matching the actual construct, or
119 ``acc_construct_runtime_api``, respectively.
121 * Will be ``acc_construct_enter_data`` or
122 ``acc_construct_exit_data`` when processing variable mappings
123 specified in OpenACC *declare* directives; should be
124 ``acc_construct_declare``.
126 * For implicit ``acc_ev_device_init_start``,
127 ``acc_ev_device_init_end``, and explicit as well as implicit
128 ``acc_ev_alloc``, ``acc_ev_free``,
129 ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``,
130 ``acc_ev_enqueue_download_start``, and
131 ``acc_ev_enqueue_download_end``, will be
132 ``acc_construct_parallel`` ; should reflect the real parent
135 acc_event_info.\*.implicit
136 For ``acc_ev_alloc``, ``acc_ev_free``,
137 ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``,
138 ``acc_ev_enqueue_download_start``, and
139 ``acc_ev_enqueue_download_end``, this currently will be ``1``
140 also for explicit usage.
142 acc_event_info.data_event.var_name
143 Always ``NULL`` ; not yet implemented.
145 acc_event_info.data_event.host_ptr
146 For ``acc_ev_alloc``, and ``acc_ev_free``, this is always
149 typedef union acc_api_info
150 ... as printed in 5.2.3. Third Argument: API-Specific
151 Information. This should obviously be ``typedef struct
154 acc_api_info.device_api
155 Possibly not yet implemented correctly for
156 ``acc_ev_compute_construct_start``,
157 ``acc_ev_device_init_start``, ``acc_ev_device_init_end`` :
158 will always be ``acc_device_api_none`` for these event types.
159 For ``acc_ev_enter_data_start``, it will be
160 ``acc_device_api_none`` in some cases.
162 acc_api_info.device_type
163 Always the same as ``acc_prof_info.device_type``.
166 Always ``-1`` ; not yet implemented.
168 acc_api_info.device_handle
169 Always ``NULL`` ; not yet implemented.
171 acc_api_info.context_handle
172 Always ``NULL`` ; not yet implemented.
174 acc_api_info.async_handle
175 Always ``NULL`` ; not yet implemented.
177 Remarks about certain event types:
179 acc_ev_device_init_start, acc_ev_device_init_end
181 .. See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
182 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
183 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
185 When a compute construct triggers implicit
186 ``acc_ev_device_init_start`` and ``acc_ev_device_init_end``
187 events, they currently aren't *nested within* the corresponding
188 ``acc_ev_compute_construct_start`` and
189 ``acc_ev_compute_construct_end``, but they're currently observed
190 *before* ``acc_ev_compute_construct_start``.
191 It's not clear what to do: the standard asks us provide a lot of
192 details to the ``acc_ev_compute_construct_start`` callback, without
193 (implicitly) initializing a device before?
195 * Callbacks for these event types will not be invoked for calls to the
196 ``acc_set_device_type`` and ``acc_set_device_num`` functions.
197 It's not clear if they should be.
199 acc_ev_enter_data_start, acc_ev_enter_data_end, acc_ev_exit_data_start, acc_ev_exit_data_end
200 * Callbacks for these event types will also be invoked for OpenACC
201 *host_data* constructs.
202 It's not clear if they should be.
204 * Callbacks for these event types will also be invoked when processing
205 variable mappings specified in OpenACC *declare* directives.
206 It's not clear if they should be.
208 Callbacks for the following event types will be invoked, but dispatch
209 and information provided therein has not yet been thoroughly reviewed:
215 * ``acc_ev_update_start``, ``acc_ev_update_end``
217 * ``acc_ev_enqueue_upload_start``, ``acc_ev_enqueue_upload_end``
219 * ``acc_ev_enqueue_download_start``, ``acc_ev_enqueue_download_end``
221 During device initialization, and finalization, respectively,
222 callbacks for the following event types will not yet be invoked:
228 Callbacks for the following event types have not yet been implemented,
229 so currently won't be invoked:
231 * ``acc_ev_device_shutdown_start``, ``acc_ev_device_shutdown_end``
233 * ``acc_ev_runtime_shutdown``
235 * ``acc_ev_create``, ``acc_ev_delete``
237 * ``acc_ev_wait_start``, ``acc_ev_wait_end``
239 For the following runtime library functions, not all expected
240 callbacks will be invoked (mostly concerning implicit device
243 * ``acc_get_num_devices``
245 * ``acc_set_device_type``
247 * ``acc_get_device_type``
249 * ``acc_set_device_num``
251 * ``acc_get_device_num``
257 Aside from implicit device initialization, for the following runtime
258 library functions, no callbacks will be invoked for shared-memory
259 offloading devices (it's not clear if they should be):
265 * ``acc_copyin``, ``acc_present_or_copyin``, ``acc_copyin_async``
267 * ``acc_create``, ``acc_present_or_create``, ``acc_create_async``
269 * ``acc_copyout``, ``acc_copyout_async``, ``acc_copyout_finalize``, ``acc_copyout_finalize_async``
271 * ``acc_delete``, ``acc_delete_async``, ``acc_delete_finalize``, ``acc_delete_finalize_async``
273 * ``acc_update_device``, ``acc_update_device_async``
275 * ``acc_update_self``, ``acc_update_self_async``
277 * ``acc_map_data``, ``acc_unmap_data``
279 * ``acc_memcpy_to_device``, ``acc_memcpy_to_device_async``
281 * ``acc_memcpy_from_device``, ``acc_memcpy_from_device_async``