]> git.ipfire.org Git - thirdparty/gcc.git/blame - libgomp/libgomp.texi
Regenerate libgomp/configure for copyright year update
[thirdparty/gcc.git] / libgomp / libgomp.texi
CommitLineData
d77de738
ML
1\input texinfo @c -*-texinfo-*-
2
3@c %**start of header
4@setfilename libgomp.info
5@settitle GNU libgomp
6@c %**end of header
7
8
9@copying
4e053a7e 10Copyright @copyright{} 2006-2024 Free Software Foundation, Inc.
d77de738
ML
11
12Permission is granted to copy, distribute and/or modify this document
13under the terms of the GNU Free Documentation License, Version 1.3 or
14any later version published by the Free Software Foundation; with the
15Invariant Sections being ``Funding Free Software'', the Front-Cover
16texts being (a) (see below), and with the Back-Cover Texts being (b)
17(see below). A copy of the license is included in the section entitled
18``GNU Free Documentation License''.
19
20(a) The FSF's Front-Cover Text is:
21
22 A GNU Manual
23
24(b) The FSF's Back-Cover Text is:
25
26 You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development.
29@end copying
30
31@ifinfo
32@dircategory GNU Libraries
33@direntry
34* libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
35@end direntry
36
37This manual documents libgomp, the GNU Offloading and Multi Processing
38Runtime library. This is the GNU implementation of the OpenMP and
39OpenACC APIs for parallel and accelerator programming in C/C++ and
40Fortran.
41
42Published by the Free Software Foundation
4351 Franklin Street, Fifth Floor
44Boston, MA 02110-1301 USA
45
46@insertcopying
47@end ifinfo
48
49
50@setchapternewpage odd
51
52@titlepage
53@title GNU Offloading and Multi Processing Runtime Library
54@subtitle The GNU OpenMP and OpenACC Implementation
55@page
56@vskip 0pt plus 1filll
57@comment For the @value{version-GCC} Version*
58@sp 1
59Published by the Free Software Foundation @*
6051 Franklin Street, Fifth Floor@*
61Boston, MA 02110-1301, USA@*
62@sp 1
63@insertcopying
64@end titlepage
65
66@summarycontents
67@contents
68@page
69
70
71@node Top, Enabling OpenMP
72@top Introduction
73@cindex Introduction
74
75This manual documents the usage of libgomp, the GNU Offloading and
76Multi Processing Runtime Library. This includes the GNU
77implementation of the @uref{https://www.openmp.org, OpenMP} Application
78Programming Interface (API) for multi-platform shared-memory parallel
79programming in C/C++ and Fortran, and the GNU implementation of the
80@uref{https://www.openacc.org, OpenACC} Application Programming
81Interface (API) for offloading of code to accelerator devices in C/C++
82and Fortran.
83
84Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
85on this, support for OpenACC and offloading (both OpenACC and OpenMP
864's target construct) has been added later on, and the library's name
87changed to GNU Offloading and Multi Processing Runtime Library.
88
89
90
91@comment
92@comment When you add a new menu item, please keep the right hand
93@comment aligned to the same column. Do not use tabs. This provides
94@comment better formatting.
95@comment
96@menu
97* Enabling OpenMP:: How to enable OpenMP for your applications.
98* OpenMP Implementation Status:: List of implemented features by OpenMP version
99* OpenMP Runtime Library Routines: Runtime Library Routines.
100 The OpenMP runtime application programming
101 interface.
102* OpenMP Environment Variables: Environment Variables.
103 Influencing OpenMP runtime behavior with
104 environment variables.
105* Enabling OpenACC:: How to enable OpenACC for your
106 applications.
107* OpenACC Runtime Library Routines:: The OpenACC runtime application
108 programming interface.
109* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
110 environment variables.
111* CUDA Streams Usage:: Notes on the implementation of
112 asynchronous operations.
113* OpenACC Library Interoperability:: OpenACC library interoperability with the
114 NVIDIA CUBLAS library.
115* OpenACC Profiling Interface::
116* OpenMP-Implementation Specifics:: Notes specifics of this OpenMP
117 implementation
118* Offload-Target Specifics:: Notes on offload-target specific internals
119* The libgomp ABI:: Notes on the external ABI presented by libgomp.
120* Reporting Bugs:: How to report bugs in the GNU Offloading and
121 Multi Processing Runtime Library.
122* Copying:: GNU general public license says
123 how you can copy and share libgomp.
124* GNU Free Documentation License::
125 How you can copy and share this manual.
126* Funding:: How to help assure continued work for free
127 software.
128* Library Index:: Index of this documentation.
129@end menu
130
131
132@c ---------------------------------------------------------------------
133@c Enabling OpenMP
134@c ---------------------------------------------------------------------
135
136@node Enabling OpenMP
137@chapter Enabling OpenMP
138
643a5223
TB
139To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
140flag @option{-fopenmp} must be specified. For C and C++, this enables
5648446c 141the handling of the OpenMP directives using @code{#pragma omp} and the
643a5223
TB
142@code{[[omp::directive(...)]]}, @code{[[omp::sequence(...)]]} and
143@code{[[omp::decl(...)]]} attributes. For Fortran, it enables for
144free source form the @code{!$omp} sentinel for directives and the
145@code{!$} conditional compilation sentinel and for fixed source form the
146@code{c$omp}, @code{*$omp} and @code{!$omp} sentinels for directives and
147the @code{c$}, @code{*$} and @code{!$} conditional compilation sentinels.
148The flag also arranges for automatic linking of the OpenMP runtime library
d77de738
ML
149(@ref{Runtime Library Routines}).
150
643a5223
TB
151The @option{-fopenmp-simd} flag can be used to enable a subset of
152OpenMP directives that do not require the linking of either the
153OpenMP runtime library or the POSIX threads library.
154
d77de738
ML
155A complete description of all OpenMP directives may be found in the
156@uref{https://www.openmp.org, OpenMP Application Program Interface} manuals.
157See also @ref{OpenMP Implementation Status}.
158
159
160@c ---------------------------------------------------------------------
161@c OpenMP Implementation Status
162@c ---------------------------------------------------------------------
163
164@node OpenMP Implementation Status
165@chapter OpenMP Implementation Status
166
167@menu
168* OpenMP 4.5:: Feature completion status to 4.5 specification
169* OpenMP 5.0:: Feature completion status to 5.0 specification
170* OpenMP 5.1:: Feature completion status to 5.1 specification
171* OpenMP 5.2:: Feature completion status to 5.2 specification
fcddf7ce 172* OpenMP Technical Report 12:: Feature completion status to second 6.0 preview
d77de738
ML
173@end menu
174
175The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version}
176parameter, provided by @code{omp_lib.h} and the @code{omp_lib} module, have
177the value @code{201511} (i.e. OpenMP 4.5).
178
179@node OpenMP 4.5
180@section OpenMP 4.5
181
182The OpenMP 4.5 specification is fully supported.
183
184@node OpenMP 5.0
185@section OpenMP 5.0
186
187@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
188@c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2
189
190@multitable @columnfractions .60 .10 .25
191@headitem Description @tab Status @tab Comments
192@item Array shaping @tab N @tab
193@item Array sections with non-unit strides in C and C++ @tab N @tab
194@item Iterators @tab Y @tab
195@item @code{metadirective} directive @tab N @tab
196@item @code{declare variant} directive
197 @tab P @tab @emph{simd} traits not handled correctly
2cd0689a 198@item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
d77de738 199 env variable @tab Y @tab
2cd0689a 200@item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
d77de738 201@item @code{requires} directive @tab P
8c2fc744 202 @tab complete but no non-host device provides @code{unified_shared_memory}
d77de738 203@item @code{teams} construct outside an enclosing target region @tab Y @tab
85da0b40
TB
204@item Non-rectangular loop nests @tab P
205 @tab Full support for C/C++, partial for Fortran
206 (@uref{https://gcc.gnu.org/PR110735,PR110735})
d77de738
ML
207@item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
208@item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
209 constructs @tab Y @tab
87f9b6c2 210@item Collapse of associated loops that are imperfectly nested loops @tab Y @tab
d77de738
ML
211@item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in
212 @code{simd} construct @tab Y @tab
213@item @code{atomic} constructs in @code{simd} @tab Y @tab
214@item @code{loop} construct @tab Y @tab
215@item @code{order(concurrent)} clause @tab Y @tab
216@item @code{scan} directive and @code{in_scan} modifier for the
217 @code{reduction} clause @tab Y @tab
218@item @code{in_reduction} clause on @code{task} constructs @tab Y @tab
219@item @code{in_reduction} clause on @code{target} constructs @tab P
220 @tab @code{nowait} only stub
221@item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab
222@item @code{task} modifier to @code{reduction} clause @tab Y @tab
223@item @code{affinity} clause to @code{task} construct @tab Y @tab Stub only
224@item @code{detach} clause to @code{task} construct @tab Y @tab
225@item @code{omp_fulfill_event} runtime routine @tab Y @tab
226@item @code{reduction} and @code{in_reduction} clauses on @code{taskloop}
227 and @code{taskloop simd} constructs @tab Y @tab
228@item @code{taskloop} construct cancelable by @code{cancel} construct
229 @tab Y @tab
230@item @code{mutexinoutset} @emph{dependence-type} for @code{depend} clause
231 @tab Y @tab
232@item Predefined memory spaces, memory allocators, allocator traits
13c3e29d 233 @tab Y @tab See also @ref{Memory allocation}
d77de738 234@item Memory management routines @tab Y @tab
d4b6d147
TB
235@item @code{allocate} directive @tab P
236 @tab Only C for stack/automatic and Fortran for stack/automatic
237 and allocatable/pointer variables
d77de738
ML
238@item @code{allocate} clause @tab P @tab Initial support
239@item @code{use_device_addr} clause on @code{target data} @tab Y @tab
f84fdb13 240@item @code{ancestor} modifier on @code{device} clause @tab Y @tab
d77de738
ML
241@item Implicit declare target directive @tab Y @tab
242@item Discontiguous array section with @code{target update} construct
243 @tab N @tab
244@item C/C++'s lvalue expressions in @code{to}, @code{from}
245 and @code{map} clauses @tab N @tab
246@item C/C++'s lvalue expressions in @code{depend} clauses @tab Y @tab
247@item Nested @code{declare target} directive @tab Y @tab
248@item Combined @code{master} constructs @tab Y @tab
249@item @code{depend} clause on @code{taskwait} @tab Y @tab
250@item Weak memory ordering clauses on @code{atomic} and @code{flush} construct
251 @tab Y @tab
252@item @code{hint} clause on the @code{atomic} construct @tab Y @tab Stub only
253@item @code{depobj} construct and depend objects @tab Y @tab
254@item Lock hints were renamed to synchronization hints @tab Y @tab
255@item @code{conditional} modifier to @code{lastprivate} clause @tab Y @tab
256@item Map-order clarifications @tab P @tab
257@item @code{close} @emph{map-type-modifier} @tab Y @tab
258@item Mapping C/C++ pointer variables and to assign the address of
259 device memory mapped by an array section @tab P @tab
260@item Mapping of Fortran pointer and allocatable variables, including pointer
261 and allocatable components of variables
262 @tab P @tab Mapping of vars with allocatable components unsupported
263@item @code{defaultmap} extensions @tab Y @tab
264@item @code{declare mapper} directive @tab N @tab
265@item @code{omp_get_supported_active_levels} routine @tab Y @tab
266@item Runtime routines and environment variables to display runtime thread
267 affinity information @tab Y @tab
268@item @code{omp_pause_resource} and @code{omp_pause_resource_all} runtime
269 routines @tab Y @tab
270@item @code{omp_get_device_num} runtime routine @tab Y @tab
271@item OMPT interface @tab N @tab
272@item OMPD interface @tab N @tab
273@end multitable
274
275@unnumberedsubsec Other new OpenMP 5.0 features
276
277@multitable @columnfractions .60 .10 .25
278@headitem Description @tab Status @tab Comments
279@item Supporting C++'s range-based for loop @tab Y @tab
280@end multitable
281
282
283@node OpenMP 5.1
284@section OpenMP 5.1
285
286@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
287
288@multitable @columnfractions .60 .10 .25
289@headitem Description @tab Status @tab Comments
290@item OpenMP directive as C++ attribute specifiers @tab Y @tab
291@item @code{omp_all_memory} reserved locator @tab Y @tab
292@item @emph{target_device trait} in OpenMP Context @tab N @tab
293@item @code{target_device} selector set in context selectors @tab N @tab
294@item C/C++'s @code{declare variant} directive: elision support of
295 preprocessed code @tab N @tab
296@item @code{declare variant}: new clauses @code{adjust_args} and
297 @code{append_args} @tab N @tab
298@item @code{dispatch} construct @tab N @tab
299@item device-specific ICV settings with environment variables @tab Y @tab
eda38850 300@item @code{assume} and @code{assumes} directives @tab Y @tab
d77de738
ML
301@item @code{nothing} directive @tab Y @tab
302@item @code{error} directive @tab Y @tab
303@item @code{masked} construct @tab Y @tab
304@item @code{scope} directive @tab Y @tab
305@item Loop transformation constructs @tab N @tab
306@item @code{strict} modifier in the @code{grainsize} and @code{num_tasks}
307 clauses of the @code{taskloop} construct @tab Y @tab
1a554a2c 308@item @code{align} clause in @code{allocate} directive @tab P
d4b6d147 309 @tab Only C and Fortran (and not for static variables)
b2e1c49b 310@item @code{align} modifier in @code{allocate} clause @tab Y @tab
d77de738
ML
311@item @code{thread_limit} clause to @code{target} construct @tab Y @tab
312@item @code{has_device_addr} clause to @code{target} construct @tab Y @tab
313@item Iterators in @code{target update} motion clauses and @code{map}
314 clauses @tab N @tab
315@item Indirect calls to the device version of a procedure or function in
a49c7d31 316 @code{target} regions @tab P @tab Only C and C++
d77de738
ML
317@item @code{interop} directive @tab N @tab
318@item @code{omp_interop_t} object support in runtime routines @tab N @tab
319@item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
320@item Extensions to the @code{atomic} directive @tab Y @tab
321@item @code{seq_cst} clause on a @code{flush} construct @tab Y @tab
322@item @code{inoutset} argument to the @code{depend} clause @tab Y @tab
323@item @code{private} and @code{firstprivate} argument to @code{default}
324 clause in C and C++ @tab Y @tab
4ede915d 325@item @code{present} argument to @code{defaultmap} clause @tab Y @tab
d77de738
ML
326@item @code{omp_set_num_teams}, @code{omp_set_teams_thread_limit},
327 @code{omp_get_max_teams}, @code{omp_get_teams_thread_limit} runtime
328 routines @tab Y @tab
329@item @code{omp_target_is_accessible} runtime routine @tab Y @tab
330@item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
331 runtime routines @tab Y @tab
332@item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
333@item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
334 @code{omp_aligned_calloc} runtime routines @tab Y @tab
335@item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
336 @code{omp_atv_default} changed @tab Y @tab
337@item @code{omp_display_env} runtime routine @tab Y @tab
338@item @code{ompt_scope_endpoint_t} enum: @code{ompt_scope_beginend} @tab N @tab
339@item @code{ompt_sync_region_t} enum additions @tab N @tab
340@item @code{ompt_state_t} enum: @code{ompt_state_wait_barrier_implementation}
341 and @code{ompt_state_wait_barrier_teams} @tab N @tab
342@item @code{ompt_callback_target_data_op_emi_t},
343 @code{ompt_callback_target_emi_t}, @code{ompt_callback_target_map_emi_t}
344 and @code{ompt_callback_target_submit_emi_t} @tab N @tab
345@item @code{ompt_callback_error_t} type @tab N @tab
346@item @code{OMP_PLACES} syntax extensions @tab Y @tab
347@item @code{OMP_NUM_TEAMS} and @code{OMP_TEAMS_THREAD_LIMIT} environment
348 variables @tab Y @tab
349@end multitable
350
351@unnumberedsubsec Other new OpenMP 5.1 features
352
353@multitable @columnfractions .60 .10 .25
354@headitem Description @tab Status @tab Comments
355@item Support of strictly structured blocks in Fortran @tab Y @tab
356@item Support of structured block sequences in C/C++ @tab Y @tab
357@item @code{unconstrained} and @code{reproducible} modifiers on @code{order}
358 clause @tab Y @tab
359@item Support @code{begin/end declare target} syntax in C/C++ @tab Y @tab
360@item Pointer predetermined firstprivate getting initialized
361to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
362@item For Fortran, diagnose placing declarative before/between @code{USE},
363 @code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
eda38850 364@item Optional comma between directive and clause in the @code{#pragma} form @tab Y @tab
a49c7d31 365@item @code{indirect} clause in @code{declare target} @tab P @tab Only C and C++
c16e85d7 366@item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab
4ede915d
TB
367@item @code{present} modifier to the @code{map}, @code{to} and @code{from}
368 clauses @tab Y @tab
d77de738
ML
369@end multitable
370
371
372@node OpenMP 5.2
373@section OpenMP 5.2
374
375@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
376
377@multitable @columnfractions .60 .10 .25
378@headitem Description @tab Status @tab Comments
2cd0689a 379@item @code{omp_in_explicit_task} routine and @var{explicit-task-var} ICV
d77de738
ML
380 @tab Y @tab
381@item @code{omp}/@code{ompx}/@code{omx} sentinels and @code{omp_}/@code{ompx_}
382 namespaces @tab N/A
383 @tab warning for @code{ompx/omx} sentinels@footnote{The @code{ompx}
384 sentinel as C/C++ pragma and C++ attributes are warned for with
385 @code{-Wunknown-pragmas} (implied by @code{-Wall}) and @code{-Wattributes}
386 (enabled by default), respectively; for Fortran free-source code, there is
387 a warning enabled by default and, for fixed-source code, the @code{omx}
388 sentinel is warned for with with @code{-Wsurprising} (enabled by
389 @code{-Wall}). Unknown clauses are always rejected with an error.}
091b6dbc 390@item Clauses on @code{end} directive can be on directive @tab Y @tab
0698c9fd 391@item @code{destroy} clause with destroy-var argument on @code{depobj}
1802f64e 392 @tab Y @tab
d77de738
ML
393@item Deprecation of no-argument @code{destroy} clause on @code{depobj}
394 @tab N @tab
395@item @code{linear} clause syntax changes and @code{step} modifier @tab Y @tab
396@item Deprecation of minus operator for reductions @tab N @tab
397@item Deprecation of separating @code{map} modifiers without comma @tab N @tab
398@item @code{declare mapper} with iterator and @code{present} modifiers
399 @tab N @tab
400@item If a matching mapped list item is not found in the data environment, the
b25ea7ab 401 pointer retains its original value @tab Y @tab
d77de738
ML
402@item New @code{enter} clause as alias for @code{to} on declare target directive
403 @tab Y @tab
404@item Deprecation of @code{to} clause on declare target directive @tab N @tab
405@item Extended list of directives permitted in Fortran pure procedures
2df7e451 406 @tab Y @tab
d4b6d147 407@item New @code{allocators} directive for Fortran @tab Y @tab
d77de738
ML
408@item Deprecation of @code{allocate} directive for Fortran
409 allocatables/pointers @tab N @tab
410@item Optional paired @code{end} directive with @code{dispatch} @tab N @tab
411@item New @code{memspace} and @code{traits} modifiers for @code{uses_allocators}
412 @tab N @tab
413@item Deprecation of traits array following the allocator_handle expression in
414 @code{uses_allocators} @tab N @tab
415@item New @code{otherwise} clause as alias for @code{default} on metadirectives
416 @tab N @tab
417@item Deprecation of @code{default} clause on metadirectives @tab N @tab
418@item Deprecation of delimited form of @code{declare target} @tab N @tab
419@item Reproducible semantics changed for @code{order(concurrent)} @tab N @tab
420@item @code{allocate} and @code{firstprivate} clauses on @code{scope}
421 @tab Y @tab
422@item @code{ompt_callback_work} @tab N @tab
9f80367e 423@item Default map-type for the @code{map} clause in @code{target enter/exit data}
d77de738
ML
424 @tab Y @tab
425@item New @code{doacross} clause as alias for @code{depend} with
426 @code{source}/@code{sink} modifier @tab Y @tab
427@item Deprecation of @code{depend} with @code{source}/@code{sink} modifier
428 @tab N @tab
429@item @code{omp_cur_iteration} keyword @tab Y @tab
430@end multitable
431
432@unnumberedsubsec Other new OpenMP 5.2 features
433
434@multitable @columnfractions .60 .10 .25
435@headitem Description @tab Status @tab Comments
436@item For Fortran, optional comma between directive and clause @tab N @tab
437@item Conforming device numbers and @code{omp_initial_device} and
438 @code{omp_invalid_device} enum/PARAMETER @tab Y @tab
2cd0689a 439@item Initial value of @var{default-device-var} ICV with
18c8b56c 440 @code{OMP_TARGET_OFFLOAD=mandatory} @tab Y @tab
0698c9fd 441@item @code{all} as @emph{implicit-behavior} for @code{defaultmap} @tab Y @tab
d77de738
ML
442@item @emph{interop_types} in any position of the modifier list for the @code{init} clause
443 of the @code{interop} construct @tab N @tab
a49c7d31
KCY
444@item Invoke virtual member functions of C++ objects created on the host device
445 on other devices @tab N @tab
d77de738
ML
446@end multitable
447
448
fcddf7ce
TB
449@node OpenMP Technical Report 12
450@section OpenMP Technical Report 12
c16e85d7 451
fcddf7ce 452Technical Report (TR) 12 is the second preview for OpenMP 6.0.
c16e85d7
TB
453
454@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
455@multitable @columnfractions .60 .10 .25
456@item Features deprecated in versions 5.2, 5.1 and 5.0 were removed
457 @tab N/A @tab Backward compatibility
fcddf7ce
TB
458@item Full support for C23 was added @tab P @tab
459@item Full support for C++23 was added @tab P @tab
c16e85d7
TB
460@item @code{_ALL} suffix to the device-scope environment variables
461 @tab P @tab Host device number wrongly accepted
fcddf7ce
TB
462@item @code{num_threads} now accepts a list @tab N @tab
463@item Supporting increments with abstract names in @code{OMP_PLACES} @tab N @tab
464@item Extension of @code{OMP_DEFAULT_DEVICE} and new
465 @code{OMP_AVAILABLE_DEVICES} environment vars @tab N @tab
466@item New @code{OMP_THREADS_RESERVE} environment variable @tab N @tab
467@item The @code{decl} attribute was added to the C++ attribute syntax
468 @tab Y @tab
469@item The OpenMP directive syntax was extended to include C 23 attribute
470 specifiers @tab Y @tab
471@item All inarguable clauses take now an optional Boolean argument @tab N @tab
c16e85d7
TB
472@item For Fortran, @emph{locator list} can be also function reference with
473 data pointer result @tab N @tab
fcddf7ce
TB
474@item Concept of @emph{assumed-size arrays} in C and C++
475 @tab N @tab
476@item @emph{directive-name-modifier} accepted in all clauses @tab N @tab
477@item For Fortran, atomic with BLOCK construct and, for C/C++, with
478 unlimited curly braces supported @tab N @tab
479@item For Fortran, atomic compare with storing the comparison result
480 @tab N @tab
481@item New @code{looprange} clause @tab N @tab
c16e85d7
TB
482@item Ref-count change for @code{use_device_ptr}/@code{use_device_addr}
483 @tab N @tab
fcddf7ce 484@item Support for inductions @tab N @tab
c16e85d7
TB
485@item Implicit reduction identifiers of C++ classes
486 @tab N @tab
487@item Change of the @emph{map-type} property from @emph{ultimate} to
488 @emph{default} @tab N @tab
fcddf7ce
TB
489@item @code{self} modifier to @code{map} and @code{self} as
490 @code{defaultmap} argument @tab N @tab
c16e85d7
TB
491@item Mapping of @emph{assumed-size arrays} in C, C++ and Fortran
492 @tab N @tab
493@item @code{groupprivate} directive @tab N @tab
fcddf7ce 494@item @code{local} clause to @code{declare target} directive @tab N @tab
c16e85d7
TB
495@item @code{part_size} allocator trait @tab N @tab
496@item @code{pin_device}, @code{preferred_device} and @code{target_access}
497 allocator traits
498 @tab N @tab
499@item @code{access} allocator trait changes @tab N @tab
500@item Extension of @code{interop} operation of @code{append_args}, allowing all
501 modifiers of the @code{init} clause
9f80367e 502 @tab N @tab
c16e85d7 503@item @code{interop} clause to @code{dispatch} @tab N @tab
fcddf7ce
TB
504@item @code{message} and @code{severity} calauses to @code{parallel} directive
505 @tab N @tab
506@item @code{self} clause to @code{requires} directive @tab N @tab
507@item @code{no_openmp_constructs} assumptions clause @tab N @tab
508@item @code{reverse} loop-transformation construct @tab N @tab
509@item @code{interchange} loop-transformation construct @tab N @tab
510@item @code{fuse} loop-transformation construct @tab N @tab
c16e85d7
TB
511@item @code{apply} code to loop-transforming constructs @tab N @tab
512@item @code{omp_curr_progress_width} identifier @tab N @tab
513@item @code{safesync} clause to the @code{parallel} construct @tab N @tab
514@item @code{omp_get_max_progress_width} runtime routine @tab N @tab
8da7476c 515@item @code{strict} modifier keyword to @code{num_threads} @tab N @tab
fcddf7ce
TB
516@item @code{atomic} permitted in a construct with @code{order(concurrent)}
517 @tab N @tab
518@item @code{coexecute} directive for Fortran @tab N @tab
519@item Fortran DO CONCURRENT as associated loop in a @code{loop} construct
520 @tab N @tab
521@item @code{threadset} clause in task-generating constructs @tab N @tab
522@item @code{nowait} clause with reverse-offload @code{target} directives
523 @tab N @tab
524@item Boolean argument to @code{nowait} and @code{nogroup} may be non constant
525 @tab N @tab
c16e85d7 526@item @code{memscope} clause to @code{atomic} and @code{flush} @tab N @tab
fcddf7ce
TB
527@item @code{omp_is_free_agent} and @code{omp_ancestor_is_free_agent} routines
528 @tab N @tab
529@item @code{omp_target_memset} and @code{omp_target_memset_rect_async} routines
530 @tab N @tab
c16e85d7
TB
531@item Routines for obtaining memory spaces/allocators for shared/device memory
532 @tab N @tab
533@item @code{omp_get_memspace_num_resources} routine @tab N @tab
534@item @code{omp_get_submemspace} routine @tab N @tab
fcddf7ce
TB
535@item @code{ompt_target_data_transfer} and @code{ompt_target_data_transfer_async}
536 values in @code{ompt_target_data_op_t} enum @tab N @tab
c16e85d7 537@item @code{ompt_get_buffer_limits} OMPT routine @tab N @tab
c16e85d7
TB
538@end multitable
539
fcddf7ce 540@unnumberedsubsec Other new TR 12 features
c16e85d7
TB
541@multitable @columnfractions .60 .10 .25
542@item Relaxed Fortran restrictions to the @code{aligned} clause @tab N @tab
543@item Mapping lambda captures @tab N @tab
fcddf7ce 544@item New @code{omp_pause_stop_tool} constant for omp_pause_resource @tab N @tab
c16e85d7
TB
545@end multitable
546
547
548
d77de738
ML
549@c ---------------------------------------------------------------------
550@c OpenMP Runtime Library Routines
551@c ---------------------------------------------------------------------
552
553@node Runtime Library Routines
554@chapter OpenMP Runtime Library Routines
555
506f068e
TB
556The runtime routines described here are defined by Section 18 of the OpenMP
557specification in version 5.2.
d77de738
ML
558
559@menu
506f068e
TB
560* Thread Team Routines::
561* Thread Affinity Routines::
562* Teams Region Routines::
563* Tasking Routines::
564@c * Resource Relinquishing Routines::
565* Device Information Routines::
e0786ba6 566* Device Memory Routines::
506f068e
TB
567* Lock Routines::
568* Timing Routines::
569* Event Routine::
570@c * Interoperability Routines::
971f119f 571* Memory Management Routines::
506f068e
TB
572@c * Tool Control Routine::
573@c * Environment Display Routine::
574@end menu
d77de738 575
506f068e
TB
576
577
578@node Thread Team Routines
579@section Thread Team Routines
580
581Routines controlling threads in the current contention group.
582They have C linkage and do not throw exceptions.
583
584@menu
585* omp_set_num_threads:: Set upper team size limit
d77de738 586* omp_get_num_threads:: Size of the active team
506f068e 587* omp_get_max_threads:: Maximum number of threads of parallel region
d77de738
ML
588* omp_get_thread_num:: Current thread ID
589* omp_in_parallel:: Whether a parallel region is active
d77de738 590* omp_set_dynamic:: Enable/disable dynamic teams
506f068e
TB
591* omp_get_dynamic:: Dynamic teams setting
592* omp_get_cancellation:: Whether cancellation support is enabled
d77de738 593* omp_set_nested:: Enable/disable nested parallel regions
506f068e 594* omp_get_nested:: Nested parallel regions
d77de738 595* omp_set_schedule:: Set the runtime scheduling method
506f068e
TB
596* omp_get_schedule:: Obtain the runtime scheduling method
597* omp_get_teams_thread_limit:: Maximum number of threads imposed by teams
598* omp_get_supported_active_levels:: Maximum number of active regions supported
599* omp_set_max_active_levels:: Limits the number of active parallel regions
600* omp_get_max_active_levels:: Current maximum number of active regions
601* omp_get_level:: Number of parallel regions
602* omp_get_ancestor_thread_num:: Ancestor thread ID
603* omp_get_team_size:: Number of threads in a team
604* omp_get_active_level:: Number of active parallel regions
605@end menu
d77de738 606
d77de738 607
d77de738 608
506f068e
TB
609@node omp_set_num_threads
610@subsection @code{omp_set_num_threads} -- Set upper team size limit
611@table @asis
612@item @emph{Description}:
613Specifies the number of threads used by default in subsequent parallel
614sections, if those do not specify a @code{num_threads} clause. The
615argument of @code{omp_set_num_threads} shall be a positive integer.
d77de738 616
506f068e
TB
617@item @emph{C/C++}:
618@multitable @columnfractions .20 .80
619@item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
620@end multitable
d77de738 621
506f068e
TB
622@item @emph{Fortran}:
623@multitable @columnfractions .20 .80
624@item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
625@item @tab @code{integer, intent(in) :: num_threads}
626@end multitable
d77de738 627
506f068e
TB
628@item @emph{See also}:
629@ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
d77de738 630
506f068e
TB
631@item @emph{Reference}:
632@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
633@end table
d77de738
ML
634
635
506f068e
TB
636
637@node omp_get_num_threads
638@subsection @code{omp_get_num_threads} -- Size of the active team
d77de738
ML
639@table @asis
640@item @emph{Description}:
506f068e
TB
641Returns the number of threads in the current team. In a sequential section of
642the program @code{omp_get_num_threads} returns 1.
d77de738 643
506f068e
TB
644The default team size may be initialized at startup by the
645@env{OMP_NUM_THREADS} environment variable. At runtime, the size
646of the current team may be set either by the @code{NUM_THREADS}
647clause or by @code{omp_set_num_threads}. If none of the above were
648used to define a specific value and @env{OMP_DYNAMIC} is disabled,
649one thread per CPU online is used.
650
651@item @emph{C/C++}:
d77de738 652@multitable @columnfractions .20 .80
506f068e 653@item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
d77de738
ML
654@end multitable
655
656@item @emph{Fortran}:
657@multitable @columnfractions .20 .80
506f068e 658@item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
d77de738
ML
659@end multitable
660
661@item @emph{See also}:
506f068e 662@ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
d77de738
ML
663
664@item @emph{Reference}:
506f068e 665@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
d77de738
ML
666@end table
667
668
669
506f068e
TB
670@node omp_get_max_threads
671@subsection @code{omp_get_max_threads} -- Maximum number of threads of parallel region
d77de738
ML
672@table @asis
673@item @emph{Description}:
506f068e
TB
674Return the maximum number of threads used for the current parallel region
675that does not use the clause @code{num_threads}.
d77de738 676
506f068e 677@item @emph{C/C++}:
d77de738 678@multitable @columnfractions .20 .80
506f068e 679@item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
d77de738
ML
680@end multitable
681
682@item @emph{Fortran}:
683@multitable @columnfractions .20 .80
506f068e 684@item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
d77de738
ML
685@end multitable
686
687@item @emph{See also}:
506f068e 688@ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
d77de738
ML
689
690@item @emph{Reference}:
506f068e 691@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
d77de738
ML
692@end table
693
694
695
506f068e
TB
696@node omp_get_thread_num
697@subsection @code{omp_get_thread_num} -- Current thread ID
d77de738
ML
698@table @asis
699@item @emph{Description}:
506f068e
TB
700Returns a unique thread identification number within the current team.
701In a sequential parts of the program, @code{omp_get_thread_num}
702always returns 0. In parallel regions the return value varies
703from 0 to @code{omp_get_num_threads}-1 inclusive. The return
704value of the primary thread of a team is always 0.
d77de738
ML
705
706@item @emph{C/C++}:
707@multitable @columnfractions .20 .80
506f068e 708@item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
d77de738
ML
709@end multitable
710
711@item @emph{Fortran}:
712@multitable @columnfractions .20 .80
506f068e 713@item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
d77de738
ML
714@end multitable
715
716@item @emph{See also}:
506f068e 717@ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
d77de738
ML
718
719@item @emph{Reference}:
506f068e 720@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
d77de738
ML
721@end table
722
723
724
506f068e
TB
725@node omp_in_parallel
726@subsection @code{omp_in_parallel} -- Whether a parallel region is active
d77de738
ML
727@table @asis
728@item @emph{Description}:
506f068e
TB
729This function returns @code{true} if currently running in parallel,
730@code{false} otherwise. Here, @code{true} and @code{false} represent
731their language-specific counterparts.
d77de738
ML
732
733@item @emph{C/C++}:
734@multitable @columnfractions .20 .80
506f068e 735@item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
d77de738
ML
736@end multitable
737
738@item @emph{Fortran}:
739@multitable @columnfractions .20 .80
506f068e 740@item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
d77de738
ML
741@end multitable
742
d77de738 743@item @emph{Reference}:
506f068e 744@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
d77de738
ML
745@end table
746
747
506f068e
TB
748@node omp_set_dynamic
749@subsection @code{omp_set_dynamic} -- Enable/disable dynamic teams
d77de738
ML
750@table @asis
751@item @emph{Description}:
506f068e
TB
752Enable or disable the dynamic adjustment of the number of threads
753within a team. The function takes the language-specific equivalent
754of @code{true} and @code{false}, where @code{true} enables dynamic
755adjustment of team sizes and @code{false} disables it.
d77de738 756
506f068e 757@item @emph{C/C++}:
d77de738 758@multitable @columnfractions .20 .80
506f068e 759@item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
d77de738
ML
760@end multitable
761
762@item @emph{Fortran}:
763@multitable @columnfractions .20 .80
506f068e
TB
764@item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
765@item @tab @code{logical, intent(in) :: dynamic_threads}
d77de738
ML
766@end multitable
767
768@item @emph{See also}:
506f068e 769@ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
d77de738
ML
770
771@item @emph{Reference}:
506f068e 772@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
d77de738
ML
773@end table
774
775
776
777@node omp_get_dynamic
506f068e 778@subsection @code{omp_get_dynamic} -- Dynamic teams setting
d77de738
ML
779@table @asis
780@item @emph{Description}:
781This function returns @code{true} if enabled, @code{false} otherwise.
782Here, @code{true} and @code{false} represent their language-specific
783counterparts.
784
785The dynamic team setting may be initialized at startup by the
786@env{OMP_DYNAMIC} environment variable or at runtime using
787@code{omp_set_dynamic}. If undefined, dynamic adjustment is
788disabled by default.
789
790@item @emph{C/C++}:
791@multitable @columnfractions .20 .80
792@item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
793@end multitable
794
795@item @emph{Fortran}:
796@multitable @columnfractions .20 .80
797@item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
798@end multitable
799
800@item @emph{See also}:
801@ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
802
803@item @emph{Reference}:
804@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
805@end table
806
807
808
506f068e
TB
809@node omp_get_cancellation
810@subsection @code{omp_get_cancellation} -- Whether cancellation support is enabled
d77de738
ML
811@table @asis
812@item @emph{Description}:
506f068e
TB
813This function returns @code{true} if cancellation is activated, @code{false}
814otherwise. Here, @code{true} and @code{false} represent their language-specific
815counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are
816deactivated.
d77de738 817
506f068e 818@item @emph{C/C++}:
d77de738 819@multitable @columnfractions .20 .80
506f068e 820@item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
d77de738
ML
821@end multitable
822
823@item @emph{Fortran}:
824@multitable @columnfractions .20 .80
506f068e 825@item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
d77de738
ML
826@end multitable
827
828@item @emph{See also}:
506f068e 829@ref{OMP_CANCELLATION}
d77de738
ML
830
831@item @emph{Reference}:
506f068e 832@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
d77de738
ML
833@end table
834
835
836
506f068e
TB
837@node omp_set_nested
838@subsection @code{omp_set_nested} -- Enable/disable nested parallel regions
d77de738
ML
839@table @asis
840@item @emph{Description}:
506f068e
TB
841Enable or disable nested parallel regions, i.e., whether team members
842are allowed to create new teams. The function takes the language-specific
843equivalent of @code{true} and @code{false}, where @code{true} enables
844dynamic adjustment of team sizes and @code{false} disables it.
d77de738 845
15886c03 846Enabling nested parallel regions also sets the maximum number of
506f068e 847active nested regions to the maximum supported. Disabling nested parallel
15886c03 848regions sets the maximum number of active nested regions to one.
506f068e
TB
849
850Note that the @code{omp_set_nested} API routine was deprecated
851in the OpenMP specification 5.2 in favor of @code{omp_set_max_active_levels}.
852
853@item @emph{C/C++}:
d77de738 854@multitable @columnfractions .20 .80
506f068e 855@item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
d77de738
ML
856@end multitable
857
858@item @emph{Fortran}:
859@multitable @columnfractions .20 .80
506f068e
TB
860@item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
861@item @tab @code{logical, intent(in) :: nested}
d77de738
ML
862@end multitable
863
864@item @emph{See also}:
506f068e
TB
865@ref{omp_get_nested}, @ref{omp_set_max_active_levels},
866@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
d77de738
ML
867
868@item @emph{Reference}:
506f068e 869@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
d77de738
ML
870@end table
871
872
873
506f068e
TB
874@node omp_get_nested
875@subsection @code{omp_get_nested} -- Nested parallel regions
d77de738
ML
876@table @asis
877@item @emph{Description}:
506f068e
TB
878This function returns @code{true} if nested parallel regions are
879enabled, @code{false} otherwise. Here, @code{true} and @code{false}
880represent their language-specific counterparts.
881
882The state of nested parallel regions at startup depends on several
883environment variables. If @env{OMP_MAX_ACTIVE_LEVELS} is defined
884and is set to greater than one, then nested parallel regions will be
885enabled. If not defined, then the value of the @env{OMP_NESTED}
886environment variable will be followed if defined. If neither are
887defined, then if either @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND}
888are defined with a list of more than one value, then nested parallel
889regions are enabled. If none of these are defined, then nested parallel
890regions are disabled by default.
891
892Nested parallel regions can be enabled or disabled at runtime using
893@code{omp_set_nested}, or by setting the maximum number of nested
894regions with @code{omp_set_max_active_levels} to one to disable, or
895above one to enable.
896
897Note that the @code{omp_get_nested} API routine was deprecated
898in the OpenMP specification 5.2 in favor of @code{omp_get_max_active_levels}.
899
900@item @emph{C/C++}:
901@multitable @columnfractions .20 .80
902@item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
903@end multitable
904
905@item @emph{Fortran}:
906@multitable @columnfractions .20 .80
907@item @emph{Interface}: @tab @code{logical function omp_get_nested()}
908@end multitable
909
910@item @emph{See also}:
911@ref{omp_get_max_active_levels}, @ref{omp_set_nested},
912@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
913
914@item @emph{Reference}:
915@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
916@end table
917
918
919
920@node omp_set_schedule
921@subsection @code{omp_set_schedule} -- Set the runtime scheduling method
922@table @asis
923@item @emph{Description}:
924Sets the runtime scheduling method. The @var{kind} argument can have the
925value @code{omp_sched_static}, @code{omp_sched_dynamic},
926@code{omp_sched_guided} or @code{omp_sched_auto}. Except for
927@code{omp_sched_auto}, the chunk size is set to the value of
928@var{chunk_size} if positive, or to the default value if zero or negative.
929For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
d77de738
ML
930
931@item @emph{C/C++}
932@multitable @columnfractions .20 .80
506f068e 933@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
d77de738
ML
934@end multitable
935
936@item @emph{Fortran}:
937@multitable @columnfractions .20 .80
506f068e
TB
938@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
939@item @tab @code{integer(kind=omp_sched_kind) kind}
940@item @tab @code{integer chunk_size}
d77de738
ML
941@end multitable
942
943@item @emph{See also}:
506f068e
TB
944@ref{omp_get_schedule}
945@ref{OMP_SCHEDULE}
d77de738
ML
946
947@item @emph{Reference}:
506f068e 948@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
d77de738
ML
949@end table
950
951
506f068e
TB
952
953@node omp_get_schedule
954@subsection @code{omp_get_schedule} -- Obtain the runtime scheduling method
d77de738
ML
955@table @asis
956@item @emph{Description}:
15886c03
TB
957Obtain the runtime scheduling method. The @var{kind} argument is set to
958@code{omp_sched_static}, @code{omp_sched_dynamic},
506f068e
TB
959@code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
960@var{chunk_size}, is set to the chunk size.
d77de738
ML
961
962@item @emph{C/C++}
963@multitable @columnfractions .20 .80
506f068e 964@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
d77de738
ML
965@end multitable
966
967@item @emph{Fortran}:
968@multitable @columnfractions .20 .80
506f068e
TB
969@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
970@item @tab @code{integer(kind=omp_sched_kind) kind}
971@item @tab @code{integer chunk_size}
d77de738
ML
972@end multitable
973
506f068e
TB
974@item @emph{See also}:
975@ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
976
d77de738 977@item @emph{Reference}:
506f068e 978@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
d77de738
ML
979@end table
980
981
506f068e
TB
982@node omp_get_teams_thread_limit
983@subsection @code{omp_get_teams_thread_limit} -- Maximum number of threads imposed by teams
d77de738
ML
984@table @asis
985@item @emph{Description}:
15886c03 986Return the maximum number of threads that are able to participate in
506f068e 987each team created by a teams construct.
d77de738
ML
988
989@item @emph{C/C++}:
990@multitable @columnfractions .20 .80
506f068e 991@item @emph{Prototype}: @tab @code{int omp_get_teams_thread_limit(void);}
d77de738
ML
992@end multitable
993
994@item @emph{Fortran}:
995@multitable @columnfractions .20 .80
506f068e 996@item @emph{Interface}: @tab @code{integer function omp_get_teams_thread_limit()}
d77de738
ML
997@end multitable
998
999@item @emph{See also}:
506f068e 1000@ref{omp_set_teams_thread_limit}, @ref{OMP_TEAMS_THREAD_LIMIT}
d77de738
ML
1001
1002@item @emph{Reference}:
506f068e 1003@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.6.
d77de738
ML
1004@end table
1005
1006
1007
506f068e
TB
1008@node omp_get_supported_active_levels
1009@subsection @code{omp_get_supported_active_levels} -- Maximum number of active regions supported
d77de738
ML
1010@table @asis
1011@item @emph{Description}:
506f068e
TB
1012This function returns the maximum number of nested, active parallel regions
1013supported by this implementation.
d77de738 1014
506f068e 1015@item @emph{C/C++}
d77de738 1016@multitable @columnfractions .20 .80
506f068e 1017@item @emph{Prototype}: @tab @code{int omp_get_supported_active_levels(void);}
d77de738
ML
1018@end multitable
1019
1020@item @emph{Fortran}:
1021@multitable @columnfractions .20 .80
506f068e 1022@item @emph{Interface}: @tab @code{integer function omp_get_supported_active_levels()}
d77de738
ML
1023@end multitable
1024
1025@item @emph{See also}:
506f068e 1026@ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
d77de738
ML
1027
1028@item @emph{Reference}:
506f068e 1029@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.15.
d77de738
ML
1030@end table
1031
1032
1033
506f068e
TB
1034@node omp_set_max_active_levels
1035@subsection @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
d77de738
ML
1036@table @asis
1037@item @emph{Description}:
506f068e
TB
1038This function limits the maximum allowed number of nested, active
1039parallel regions. @var{max_levels} must be less or equal to
1040the value returned by @code{omp_get_supported_active_levels}.
d77de738 1041
506f068e
TB
1042@item @emph{C/C++}
1043@multitable @columnfractions .20 .80
1044@item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
1045@end multitable
d77de738 1046
506f068e
TB
1047@item @emph{Fortran}:
1048@multitable @columnfractions .20 .80
1049@item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
1050@item @tab @code{integer max_levels}
1051@end multitable
d77de738 1052
506f068e
TB
1053@item @emph{See also}:
1054@ref{omp_get_max_active_levels}, @ref{omp_get_active_level},
1055@ref{omp_get_supported_active_levels}
2cd0689a 1056
506f068e
TB
1057@item @emph{Reference}:
1058@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
1059@end table
1060
1061
1062
1063@node omp_get_max_active_levels
1064@subsection @code{omp_get_max_active_levels} -- Current maximum number of active regions
1065@table @asis
1066@item @emph{Description}:
1067This function obtains the maximum allowed number of nested, active parallel regions.
1068
1069@item @emph{C/C++}
d77de738 1070@multitable @columnfractions .20 .80
506f068e 1071@item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
d77de738
ML
1072@end multitable
1073
1074@item @emph{Fortran}:
1075@multitable @columnfractions .20 .80
506f068e 1076@item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
d77de738
ML
1077@end multitable
1078
1079@item @emph{See also}:
506f068e 1080@ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
d77de738
ML
1081
1082@item @emph{Reference}:
506f068e 1083@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
d77de738
ML
1084@end table
1085
1086
506f068e
TB
1087@node omp_get_level
1088@subsection @code{omp_get_level} -- Obtain the current nesting level
d77de738
ML
1089@table @asis
1090@item @emph{Description}:
506f068e
TB
1091This function returns the nesting level for the parallel blocks,
1092which enclose the calling call.
d77de738 1093
506f068e 1094@item @emph{C/C++}
d77de738 1095@multitable @columnfractions .20 .80
506f068e 1096@item @emph{Prototype}: @tab @code{int omp_get_level(void);}
d77de738
ML
1097@end multitable
1098
1099@item @emph{Fortran}:
1100@multitable @columnfractions .20 .80
506f068e 1101@item @emph{Interface}: @tab @code{integer function omp_level()}
d77de738
ML
1102@end multitable
1103
506f068e
TB
1104@item @emph{See also}:
1105@ref{omp_get_active_level}
1106
d77de738 1107@item @emph{Reference}:
506f068e 1108@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
d77de738
ML
1109@end table
1110
1111
1112
506f068e
TB
1113@node omp_get_ancestor_thread_num
1114@subsection @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
d77de738
ML
1115@table @asis
1116@item @emph{Description}:
506f068e
TB
1117This function returns the thread identification number for the given
1118nesting level of the current thread. For values of @var{level} outside
1119zero to @code{omp_get_level} -1 is returned; if @var{level} is
1120@code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
d77de738 1121
506f068e 1122@item @emph{C/C++}
d77de738 1123@multitable @columnfractions .20 .80
506f068e 1124@item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
d77de738
ML
1125@end multitable
1126
1127@item @emph{Fortran}:
1128@multitable @columnfractions .20 .80
506f068e
TB
1129@item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
1130@item @tab @code{integer level}
d77de738
ML
1131@end multitable
1132
506f068e
TB
1133@item @emph{See also}:
1134@ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
1135
d77de738 1136@item @emph{Reference}:
506f068e 1137@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
d77de738
ML
1138@end table
1139
1140
1141
506f068e
TB
1142@node omp_get_team_size
1143@subsection @code{omp_get_team_size} -- Number of threads in a team
d77de738
ML
1144@table @asis
1145@item @emph{Description}:
506f068e
TB
1146This function returns the number of threads in a thread team to which
1147either the current thread or its ancestor belongs. For values of @var{level}
1148outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
11491 is returned, and for @code{omp_get_level}, the result is identical
1150to @code{omp_get_num_threads}.
d77de738
ML
1151
1152@item @emph{C/C++}:
1153@multitable @columnfractions .20 .80
506f068e 1154@item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
d77de738
ML
1155@end multitable
1156
1157@item @emph{Fortran}:
1158@multitable @columnfractions .20 .80
506f068e
TB
1159@item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
1160@item @tab @code{integer level}
d77de738
ML
1161@end multitable
1162
506f068e
TB
1163@item @emph{See also}:
1164@ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
1165
d77de738 1166@item @emph{Reference}:
506f068e 1167@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
d77de738
ML
1168@end table
1169
1170
1171
506f068e
TB
1172@node omp_get_active_level
1173@subsection @code{omp_get_active_level} -- Number of parallel regions
d77de738
ML
1174@table @asis
1175@item @emph{Description}:
506f068e
TB
1176This function returns the nesting level for the active parallel blocks,
1177which enclose the calling call.
d77de738 1178
506f068e 1179@item @emph{C/C++}
d77de738 1180@multitable @columnfractions .20 .80
506f068e 1181@item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
d77de738
ML
1182@end multitable
1183
1184@item @emph{Fortran}:
1185@multitable @columnfractions .20 .80
506f068e 1186@item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
d77de738
ML
1187@end multitable
1188
1189@item @emph{See also}:
506f068e 1190@ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
d77de738
ML
1191
1192@item @emph{Reference}:
506f068e 1193@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
d77de738
ML
1194@end table
1195
1196
1197
506f068e
TB
1198@node Thread Affinity Routines
1199@section Thread Affinity Routines
1200
1201Routines controlling and accessing thread-affinity policies.
1202They have C linkage and do not throw exceptions.
1203
1204@menu
1205* omp_get_proc_bind:: Whether threads may be moved between CPUs
1206@c * omp_get_num_places:: <fixme>
1207@c * omp_get_place_num_procs:: <fixme>
1208@c * omp_get_place_proc_ids:: <fixme>
1209@c * omp_get_place_num:: <fixme>
1210@c * omp_get_partition_num_places:: <fixme>
1211@c * omp_get_partition_place_nums:: <fixme>
1212@c * omp_set_affinity_format:: <fixme>
1213@c * omp_get_affinity_format:: <fixme>
1214@c * omp_display_affinity:: <fixme>
1215@c * omp_capture_affinity:: <fixme>
1216@end menu
1217
1218
1219
d77de738 1220@node omp_get_proc_bind
506f068e 1221@subsection @code{omp_get_proc_bind} -- Whether threads may be moved between CPUs
d77de738
ML
1222@table @asis
1223@item @emph{Description}:
1224This functions returns the currently active thread affinity policy, which is
1225set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false},
1226@code{omp_proc_bind_true}, @code{omp_proc_bind_primary},
1227@code{omp_proc_bind_master}, @code{omp_proc_bind_close} and @code{omp_proc_bind_spread},
1228where @code{omp_proc_bind_master} is an alias for @code{omp_proc_bind_primary}.
1229
1230@item @emph{C/C++}:
1231@multitable @columnfractions .20 .80
1232@item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
1233@end multitable
1234
1235@item @emph{Fortran}:
1236@multitable @columnfractions .20 .80
1237@item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
1238@end multitable
1239
1240@item @emph{See also}:
1241@ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
1242
1243@item @emph{Reference}:
1244@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
1245@end table
1246
1247
1248
506f068e
TB
1249@node Teams Region Routines
1250@section Teams Region Routines
d77de738 1251
506f068e
TB
1252Routines controlling the league of teams that are executed in a @code{teams}
1253region. They have C linkage and do not throw exceptions.
d77de738 1254
506f068e
TB
1255@menu
1256* omp_get_num_teams:: Number of teams
1257* omp_get_team_num:: Get team number
1258* omp_set_num_teams:: Set upper teams limit for teams region
1259* omp_get_max_teams:: Maximum number of teams for teams region
1260* omp_set_teams_thread_limit:: Set upper thread limit for teams construct
1261* omp_get_thread_limit:: Maximum number of threads
1262@end menu
d77de738 1263
d77de738
ML
1264
1265
506f068e
TB
1266@node omp_get_num_teams
1267@subsection @code{omp_get_num_teams} -- Number of teams
d77de738
ML
1268@table @asis
1269@item @emph{Description}:
506f068e 1270Returns the number of teams in the current team region.
d77de738 1271
506f068e 1272@item @emph{C/C++}:
d77de738 1273@multitable @columnfractions .20 .80
506f068e 1274@item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
d77de738
ML
1275@end multitable
1276
1277@item @emph{Fortran}:
1278@multitable @columnfractions .20 .80
506f068e 1279@item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
d77de738
ML
1280@end multitable
1281
d77de738 1282@item @emph{Reference}:
506f068e 1283@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
d77de738
ML
1284@end table
1285
1286
1287
1288@node omp_get_team_num
506f068e 1289@subsection @code{omp_get_team_num} -- Get team number
d77de738
ML
1290@table @asis
1291@item @emph{Description}:
1292Returns the team number of the calling thread.
1293
1294@item @emph{C/C++}:
1295@multitable @columnfractions .20 .80
1296@item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
1297@end multitable
1298
1299@item @emph{Fortran}:
1300@multitable @columnfractions .20 .80
1301@item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
1302@end multitable
1303
1304@item @emph{Reference}:
1305@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
1306@end table
1307
1308
1309
506f068e
TB
1310@node omp_set_num_teams
1311@subsection @code{omp_set_num_teams} -- Set upper teams limit for teams construct
d77de738
ML
1312@table @asis
1313@item @emph{Description}:
506f068e
TB
1314Specifies the upper bound for number of teams created by the teams construct
1315which does not specify a @code{num_teams} clause. The
1316argument of @code{omp_set_num_teams} shall be a positive integer.
d77de738
ML
1317
1318@item @emph{C/C++}:
1319@multitable @columnfractions .20 .80
506f068e 1320@item @emph{Prototype}: @tab @code{void omp_set_num_teams(int num_teams);}
d77de738
ML
1321@end multitable
1322
1323@item @emph{Fortran}:
1324@multitable @columnfractions .20 .80
506f068e
TB
1325@item @emph{Interface}: @tab @code{subroutine omp_set_num_teams(num_teams)}
1326@item @tab @code{integer, intent(in) :: num_teams}
d77de738
ML
1327@end multitable
1328
1329@item @emph{See also}:
506f068e 1330@ref{OMP_NUM_TEAMS}, @ref{omp_get_num_teams}, @ref{omp_get_max_teams}
d77de738
ML
1331
1332@item @emph{Reference}:
506f068e 1333@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.3.
d77de738
ML
1334@end table
1335
1336
1337
506f068e
TB
1338@node omp_get_max_teams
1339@subsection @code{omp_get_max_teams} -- Maximum number of teams of teams region
d77de738
ML
1340@table @asis
1341@item @emph{Description}:
506f068e
TB
1342Return the maximum number of teams used for the teams region
1343that does not use the clause @code{num_teams}.
d77de738
ML
1344
1345@item @emph{C/C++}:
1346@multitable @columnfractions .20 .80
506f068e 1347@item @emph{Prototype}: @tab @code{int omp_get_max_teams(void);}
d77de738
ML
1348@end multitable
1349
1350@item @emph{Fortran}:
1351@multitable @columnfractions .20 .80
506f068e 1352@item @emph{Interface}: @tab @code{integer function omp_get_max_teams()}
d77de738
ML
1353@end multitable
1354
1355@item @emph{See also}:
506f068e 1356@ref{omp_set_num_teams}, @ref{omp_get_num_teams}
d77de738
ML
1357
1358@item @emph{Reference}:
506f068e 1359@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.4.
d77de738
ML
1360@end table
1361
1362
1363
506f068e
TB
1364@node omp_set_teams_thread_limit
1365@subsection @code{omp_set_teams_thread_limit} -- Set upper thread limit for teams construct
d77de738
ML
1366@table @asis
1367@item @emph{Description}:
15886c03 1368Specifies the upper bound for number of threads that are available
506f068e
TB
1369for each team created by the teams construct which does not specify a
1370@code{thread_limit} clause. The argument of
1371@code{omp_set_teams_thread_limit} shall be a positive integer.
d77de738
ML
1372
1373@item @emph{C/C++}:
1374@multitable @columnfractions .20 .80
506f068e 1375@item @emph{Prototype}: @tab @code{void omp_set_teams_thread_limit(int thread_limit);}
d77de738
ML
1376@end multitable
1377
1378@item @emph{Fortran}:
1379@multitable @columnfractions .20 .80
506f068e
TB
1380@item @emph{Interface}: @tab @code{subroutine omp_set_teams_thread_limit(thread_limit)}
1381@item @tab @code{integer, intent(in) :: thread_limit}
d77de738
ML
1382@end multitable
1383
1384@item @emph{See also}:
506f068e 1385@ref{OMP_TEAMS_THREAD_LIMIT}, @ref{omp_get_teams_thread_limit}, @ref{omp_get_thread_limit}
d77de738
ML
1386
1387@item @emph{Reference}:
506f068e 1388@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.5.
d77de738
ML
1389@end table
1390
1391
1392
506f068e
TB
1393@node omp_get_thread_limit
1394@subsection @code{omp_get_thread_limit} -- Maximum number of threads
d77de738
ML
1395@table @asis
1396@item @emph{Description}:
506f068e 1397Return the maximum number of threads of the program.
d77de738
ML
1398
1399@item @emph{C/C++}:
1400@multitable @columnfractions .20 .80
506f068e 1401@item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
d77de738
ML
1402@end multitable
1403
1404@item @emph{Fortran}:
1405@multitable @columnfractions .20 .80
506f068e 1406@item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
d77de738
ML
1407@end multitable
1408
1409@item @emph{See also}:
506f068e 1410@ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
d77de738
ML
1411
1412@item @emph{Reference}:
506f068e 1413@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
d77de738
ML
1414@end table
1415
1416
1417
506f068e
TB
1418@node Tasking Routines
1419@section Tasking Routines
1420
1421Routines relating to explicit tasks.
1422They have C linkage and do not throw exceptions.
1423
1424@menu
1425* omp_get_max_task_priority:: Maximum task priority value that can be set
819f3d36 1426* omp_in_explicit_task:: Whether a given task is an explicit task
506f068e 1427* omp_in_final:: Whether in final or included task region
fcddf7ce
TB
1428@c * omp_is_free_agent:: <fixme>/TR12
1429@c * omp_ancestor_is_free_agent:: <fixme>/TR12
506f068e
TB
1430@end menu
1431
1432
1433
1434@node omp_get_max_task_priority
1435@subsection @code{omp_get_max_task_priority} -- Maximum priority value
1436that can be set for tasks.
d77de738
ML
1437@table @asis
1438@item @emph{Description}:
506f068e 1439This function obtains the maximum allowed priority number for tasks.
d77de738 1440
506f068e 1441@item @emph{C/C++}
d77de738 1442@multitable @columnfractions .20 .80
506f068e 1443@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
d77de738
ML
1444@end multitable
1445
1446@item @emph{Fortran}:
1447@multitable @columnfractions .20 .80
506f068e 1448@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
d77de738
ML
1449@end multitable
1450
1451@item @emph{Reference}:
506f068e 1452@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
d77de738
ML
1453@end table
1454
1455
506f068e 1456
819f3d36
TB
1457@node omp_in_explicit_task
1458@subsection @code{omp_in_explicit_task} -- Whether a given task is an explicit task
1459@table @asis
1460@item @emph{Description}:
1461The function returns the @var{explicit-task-var} ICV; it returns true when the
1462encountering task was generated by a task-generating construct such as
1463@code{target}, @code{task} or @code{taskloop}. Otherwise, the encountering task
1464is in an implicit task region such as generated by the implicit or explicit
1465@code{parallel} region and @code{omp_in_explicit_task} returns false.
1466
1467@item @emph{C/C++}
1468@multitable @columnfractions .20 .80
1469@item @emph{Prototype}: @tab @code{int omp_in_explicit_task(void);}
1470@end multitable
1471
1472@item @emph{Fortran}:
1473@multitable @columnfractions .20 .80
1474@item @emph{Interface}: @tab @code{logical function omp_in_explicit_task()}
1475@end multitable
1476
1477@item @emph{Reference}:
1478@uref{https://www.openmp.org, OpenMP specification v5.2}, Section 18.5.2.
1479@end table
1480
1481
1482
d77de738 1483@node omp_in_final
506f068e 1484@subsection @code{omp_in_final} -- Whether in final or included task region
d77de738
ML
1485@table @asis
1486@item @emph{Description}:
1487This function returns @code{true} if currently running in a final
1488or included task region, @code{false} otherwise. Here, @code{true}
1489and @code{false} represent their language-specific counterparts.
1490
1491@item @emph{C/C++}:
1492@multitable @columnfractions .20 .80
1493@item @emph{Prototype}: @tab @code{int omp_in_final(void);}
1494@end multitable
1495
1496@item @emph{Fortran}:
1497@multitable @columnfractions .20 .80
1498@item @emph{Interface}: @tab @code{logical function omp_in_final()}
1499@end multitable
1500
1501@item @emph{Reference}:
1502@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
1503@end table
1504
1505
1506
506f068e
TB
1507@c @node Resource Relinquishing Routines
1508@c @section Resource Relinquishing Routines
1509@c
1510@c Routines releasing resources used by the OpenMP runtime.
1511@c They have C linkage and do not throw exceptions.
1512@c
1513@c @menu
1514@c * omp_pause_resource:: <fixme>
1515@c * omp_pause_resource_all:: <fixme>
1516@c @end menu
1517
1518@node Device Information Routines
1519@section Device Information Routines
1520
1521Routines related to devices available to an OpenMP program.
1522They have C linkage and do not throw exceptions.
1523
1524@menu
1525* omp_get_num_procs:: Number of processors online
1526@c * omp_get_max_progress_width:: <fixme>/TR11
1527* omp_set_default_device:: Set the default device for target regions
1528* omp_get_default_device:: Get the default device for target regions
1529* omp_get_num_devices:: Number of target devices
1530* omp_get_device_num:: Get device that current thread is running on
1531* omp_is_initial_device:: Whether executing on the host device
1532* omp_get_initial_device:: Device number of host device
1533@end menu
1534
1535
1536
1537@node omp_get_num_procs
1538@subsection @code{omp_get_num_procs} -- Number of processors online
d77de738
ML
1539@table @asis
1540@item @emph{Description}:
506f068e 1541Returns the number of processors online on that device.
d77de738
ML
1542
1543@item @emph{C/C++}:
1544@multitable @columnfractions .20 .80
506f068e 1545@item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
d77de738
ML
1546@end multitable
1547
1548@item @emph{Fortran}:
1549@multitable @columnfractions .20 .80
506f068e 1550@item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
d77de738
ML
1551@end multitable
1552
1553@item @emph{Reference}:
506f068e 1554@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
d77de738
ML
1555@end table
1556
1557
1558
1559@node omp_set_default_device
506f068e 1560@subsection @code{omp_set_default_device} -- Set the default device for target regions
d77de738
ML
1561@table @asis
1562@item @emph{Description}:
1563Set the default device for target regions without device clause. The argument
1564shall be a nonnegative device number.
1565
1566@item @emph{C/C++}:
1567@multitable @columnfractions .20 .80
1568@item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
1569@end multitable
1570
1571@item @emph{Fortran}:
1572@multitable @columnfractions .20 .80
1573@item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
1574@item @tab @code{integer device_num}
1575@end multitable
1576
1577@item @emph{See also}:
1578@ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
1579
1580@item @emph{Reference}:
1581@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
1582@end table
1583
1584
1585
506f068e
TB
1586@node omp_get_default_device
1587@subsection @code{omp_get_default_device} -- Get the default device for target regions
d77de738
ML
1588@table @asis
1589@item @emph{Description}:
506f068e 1590Get the default device for target regions without device clause.
2cd0689a 1591
d77de738
ML
1592@item @emph{C/C++}:
1593@multitable @columnfractions .20 .80
506f068e 1594@item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
d77de738
ML
1595@end multitable
1596
1597@item @emph{Fortran}:
1598@multitable @columnfractions .20 .80
506f068e 1599@item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
d77de738
ML
1600@end multitable
1601
1602@item @emph{See also}:
506f068e 1603@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
d77de738
ML
1604
1605@item @emph{Reference}:
506f068e 1606@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
d77de738
ML
1607@end table
1608
1609
1610
506f068e
TB
1611@node omp_get_num_devices
1612@subsection @code{omp_get_num_devices} -- Number of target devices
d77de738
ML
1613@table @asis
1614@item @emph{Description}:
506f068e 1615Returns the number of target devices.
d77de738
ML
1616
1617@item @emph{C/C++}:
1618@multitable @columnfractions .20 .80
506f068e 1619@item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
d77de738
ML
1620@end multitable
1621
1622@item @emph{Fortran}:
1623@multitable @columnfractions .20 .80
506f068e 1624@item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
d77de738
ML
1625@end multitable
1626
d77de738 1627@item @emph{Reference}:
506f068e 1628@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
d77de738
ML
1629@end table
1630
1631
1632
506f068e
TB
1633@node omp_get_device_num
1634@subsection @code{omp_get_device_num} -- Return device number of current device
d77de738
ML
1635@table @asis
1636@item @emph{Description}:
506f068e
TB
1637This function returns a device number that represents the device that the
1638current thread is executing on. For OpenMP 5.0, this must be equal to the
1639value returned by the @code{omp_get_initial_device} function when called
1640from the host.
d77de738 1641
506f068e 1642@item @emph{C/C++}
d77de738 1643@multitable @columnfractions .20 .80
506f068e 1644@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
d77de738
ML
1645@end multitable
1646
1647@item @emph{Fortran}:
506f068e
TB
1648@multitable @columnfractions .20 .80
1649@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
d77de738
ML
1650@end multitable
1651
1652@item @emph{See also}:
506f068e 1653@ref{omp_get_initial_device}
d77de738
ML
1654
1655@item @emph{Reference}:
506f068e 1656@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
d77de738
ML
1657@end table
1658
1659
1660
506f068e
TB
1661@node omp_is_initial_device
1662@subsection @code{omp_is_initial_device} -- Whether executing on the host device
d77de738
ML
1663@table @asis
1664@item @emph{Description}:
506f068e
TB
1665This function returns @code{true} if currently running on the host device,
1666@code{false} otherwise. Here, @code{true} and @code{false} represent
1667their language-specific counterparts.
d77de738 1668
506f068e 1669@item @emph{C/C++}:
d77de738 1670@multitable @columnfractions .20 .80
506f068e 1671@item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
d77de738
ML
1672@end multitable
1673
1674@item @emph{Fortran}:
1675@multitable @columnfractions .20 .80
506f068e 1676@item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
d77de738
ML
1677@end multitable
1678
d77de738 1679@item @emph{Reference}:
506f068e 1680@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
d77de738
ML
1681@end table
1682
1683
1684
506f068e
TB
1685@node omp_get_initial_device
1686@subsection @code{omp_get_initial_device} -- Return device number of initial device
d77de738
ML
1687@table @asis
1688@item @emph{Description}:
506f068e
TB
1689This function returns a device number that represents the host device.
1690For OpenMP 5.1, this must be equal to the value returned by the
1691@code{omp_get_num_devices} function.
d77de738 1692
506f068e 1693@item @emph{C/C++}
d77de738 1694@multitable @columnfractions .20 .80
506f068e 1695@item @emph{Prototype}: @tab @code{int omp_get_initial_device(void);}
d77de738
ML
1696@end multitable
1697
1698@item @emph{Fortran}:
1699@multitable @columnfractions .20 .80
506f068e 1700@item @emph{Interface}: @tab @code{integer function omp_get_initial_device()}
d77de738
ML
1701@end multitable
1702
1703@item @emph{See also}:
506f068e 1704@ref{omp_get_num_devices}
d77de738
ML
1705
1706@item @emph{Reference}:
506f068e 1707@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.35.
d77de738
ML
1708@end table
1709
1710
1711
e0786ba6
TB
1712@node Device Memory Routines
1713@section Device Memory Routines
1714
1715Routines related to memory allocation and managing corresponding
1716pointers on devices. They have C linkage and do not throw exceptions.
1717
1718@menu
1719* omp_target_alloc:: Allocate device memory
1720* omp_target_free:: Free device memory
1721* omp_target_is_present:: Check whether storage is mapped
506f068e
TB
1722@c * omp_target_is_accessible:: <fixme>
1723@c * omp_target_memcpy:: <fixme>
1724@c * omp_target_memcpy_rect:: <fixme>
1725@c * omp_target_memcpy_async:: <fixme>
1726@c * omp_target_memcpy_rect_async:: <fixme>
e0786ba6
TB
1727@c * omp_target_memset:: <fixme>/TR12
1728@c * omp_target_memset_async:: <fixme>/TR12
1729* omp_target_associate_ptr:: Associate a device pointer with a host pointer
1730* omp_target_disassociate_ptr:: Remove device--host pointer association
1731* omp_get_mapped_ptr:: Return device pointer to a host pointer
1732@end menu
1733
1734
1735
1736@node omp_target_alloc
1737@subsection @code{omp_target_alloc} -- Allocate device memory
1738@table @asis
1739@item @emph{Description}:
1740This routine allocates @var{size} bytes of memory in the device environment
1741associated with the device number @var{device_num}. If successful, a device
1742pointer is returned, otherwise a null pointer.
1743
1744In GCC, when the device is the host or the device shares memory with the host,
1745the memory is allocated on the host; in that case, when @var{size} is zero,
1746either NULL or a unique pointer value that can later be successfully passed to
1747@code{omp_target_free} is returned. When the allocation is not performed on
1748the host, a null pointer is returned when @var{size} is zero; in that case,
1749additionally a diagnostic might be printed to standard error (stderr).
1750
1751Running this routine in a @code{target} region except on the initial device
1752is not supported.
1753
1754@item @emph{C/C++}
1755@multitable @columnfractions .20 .80
1756@item @emph{Prototype}: @tab @code{void *omp_target_alloc(size_t size, int device_num)}
1757@end multitable
1758
1759@item @emph{Fortran}:
1760@multitable @columnfractions .20 .80
1761@item @emph{Interface}: @tab @code{type(c_ptr) function omp_target_alloc(size, device_num) bind(C)}
1762@item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int, c_size_t}
1763@item @tab @code{integer(c_size_t), value :: size}
1764@item @tab @code{integer(c_int), value :: device_num}
1765@end multitable
1766
1767@item @emph{See also}:
1768@ref{omp_target_free}, @ref{omp_target_associate_ptr}
1769
1770@item @emph{Reference}:
1771@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.1
1772@end table
1773
1774
1775
1776@node omp_target_free
1777@subsection @code{omp_target_free} -- Free device memory
1778@table @asis
1779@item @emph{Description}:
1780This routine frees memory allocated by the @code{omp_target_alloc} routine.
1781The @var{device_ptr} argument must be either a null pointer or a device pointer
1782returned by @code{omp_target_alloc} for the specified @code{device_num}. The
1783device number @var{device_num} must be a conforming device number.
1784
1785Running this routine in a @code{target} region except on the initial device
1786is not supported.
1787
1788@item @emph{C/C++}
1789@multitable @columnfractions .20 .80
1790@item @emph{Prototype}: @tab @code{void omp_target_free(void *device_ptr, int device_num)}
1791@end multitable
1792
1793@item @emph{Fortran}:
1794@multitable @columnfractions .20 .80
1795@item @emph{Interface}: @tab @code{subroutine omp_target_free(device_ptr, device_num) bind(C)}
1796@item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
1797@item @tab @code{type(c_ptr), value :: device_ptr}
1798@item @tab @code{integer(c_int), value :: device_num}
1799@end multitable
1800
1801@item @emph{See also}:
1802@ref{omp_target_alloc}, @ref{omp_target_disassociate_ptr}
1803
1804@item @emph{Reference}:
1805@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.2
1806@end table
1807
1808
1809
1810@node omp_target_is_present
1811@subsection @code{omp_target_is_present} -- Check whether storage is mapped
1812@table @asis
1813@item @emph{Description}:
1814This routine tests whether storage, identified by the host pointer @var{ptr}
1815is mapped to the device specified by @var{device_num}. If so, it returns
1816@emph{true} and otherwise @emph{false}.
1817
1818In GCC, this includes self mapping such that @code{omp_target_is_present}
1819returns @emph{true} when @var{device_num} specifies the host or when the host
1820and the device share memory. If @var{ptr} is a null pointer, @var{true} is
1821returned and if @var{device_num} is an invalid device number, @var{false} is
1822returned.
1823
1824If those conditions do not apply, @emph{true} is returned if the association has
1825been established by an explicit or implicit @code{map} clause, the
1826@code{declare target} directive or a call to the @code{omp_target_associate_ptr}
1827routine.
1828
1829Running this routine in a @code{target} region except on the initial device
1830is not supported.
1831
1832@item @emph{C/C++}
1833@multitable @columnfractions .20 .80
1834@item @emph{Prototype}: @tab @code{int omp_target_is_present(const void *ptr,}
1835@item @tab @code{ int device_num)}
1836@end multitable
1837
1838@item @emph{Fortran}:
1839@multitable @columnfractions .20 .80
1840@item @emph{Interface}: @tab @code{integer(c_int) function omp_target_is_present(ptr, &}
1841@item @tab @code{ device_num) bind(C)}
1842@item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
1843@item @tab @code{type(c_ptr), value :: ptr}
1844@item @tab @code{integer(c_int), value :: device_num}
1845@end multitable
1846
1847@item @emph{See also}:
1848@ref{omp_target_associate_ptr}
1849
1850@item @emph{Reference}:
1851@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.3
1852@end table
1853
1854
1855
1856@node omp_target_associate_ptr
1857@subsection @code{omp_target_associate_ptr} -- Associate a device pointer with a host pointer
1858@table @asis
1859@item @emph{Description}:
1860This routine associates storage on the host with storage on a device identified
1861by @var{device_num}. The device pointer is usually obtained by calling
1862@code{omp_target_alloc} or by other means (but not by using the @code{map}
1863clauses or the @code{declare target} directive). The host pointer should point
1864to memory that has a storage size of at least @var{size}.
1865
1866The @var{device_offset} parameter specifies the offset into @var{device_ptr}
1867that is used as the base address for the device side of the mapping; the
1868storage size should be at least @var{device_offset} plus @var{size}.
1869
1870After the association, the host pointer can be used in a @code{map} clause and
1871in the @code{to} and @code{from} clauses of the @code{target update} directive
1872to transfer data between the associated pointers. The reference count of such
1873associated storage is infinite. The association can be removed by calling
1874@code{omp_target_disassociate_ptr} which should be done before the lifetime
1875of either either storage ends.
1876
1877The routine returns nonzero (@code{EINVAL}) when the @var{device_num} invalid,
1878for when the initial device or the associated device shares memory with the
1879host. @code{omp_target_associate_ptr} returns zero if @var{host_ptr} points
1880into already associated storage that is fully inside of a previously associated
1881memory. Otherwise, if the association was successful zero is returned; if none
1882of the cases above apply, nonzero (@code{EINVAL}) is returned.
1883
1884The @code{omp_target_is_present} routine can be used to test whether
1885associated storage for a device pointer exists.
1886
1887Running this routine in a @code{target} region except on the initial device
1888is not supported.
1889
1890@item @emph{C/C++}
1891@multitable @columnfractions .20 .80
1892@item @emph{Prototype}: @tab @code{int omp_target_associate_ptr(const void *host_ptr,}
1893@item @tab @code{ const void *device_ptr,}
1894@item @tab @code{ size_t size,}
1895@item @tab @code{ size_t device_offset,}
1896@item @tab @code{ int device_num)}
1897@end multitable
1898
1899@item @emph{Fortran}:
1900@multitable @columnfractions .20 .80
1901@item @emph{Interface}: @tab @code{integer(c_int) function omp_target_associate_ptr(host_ptr, &}
1902@item @tab @code{ device_ptr, size, device_offset, device_num) bind(C)}
1903@item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int, c_size_t}
1904@item @tab @code{type(c_ptr), value :: host_ptr, device_ptr}
1905@item @tab @code{integer(c_size_t), value :: size, device_offset}
1906@item @tab @code{integer(c_int), value :: device_num}
1907@end multitable
1908
1909@item @emph{See also}:
1910@ref{omp_target_disassociate_ptr}, @ref{omp_target_is_present},
1911@ref{omp_target_alloc}
1912
1913@item @emph{Reference}:
1914@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.9
1915@end table
1916
1917
1918
1919@node omp_target_disassociate_ptr
1920@subsection @code{omp_target_disassociate_ptr} -- Remove device--host pointer association
1921@table @asis
1922@item @emph{Description}:
1923This routine removes the storage association established by calling
1924@code{omp_target_associate_ptr} and sets the reference count to zero,
1925even if @code{omp_target_associate_ptr} was invoked multiple times for
1926for host pointer @code{ptr}. If applicable, the device memory needs
1927to be freed by the user.
1928
1929If an associated device storage location for the @var{device_num} was
1930found and has infinite reference count, the association is removed and
1931zero is returned. In all other cases, nonzero (@code{EINVAL}) is returned
1932and no other action is taken.
1933
1934Note that passing a host pointer where the association to the device pointer
1935was established with the @code{declare target} directive yields undefined
1936behavior.
1937
1938Running this routine in a @code{target} region except on the initial device
1939is not supported.
1940
1941@item @emph{C/C++}
1942@multitable @columnfractions .20 .80
1943@item @emph{Prototype}: @tab @code{int omp_target_disassociate_ptr(const void *ptr,}
1944@item @tab @code{ int device_num)}
1945@end multitable
1946
1947@item @emph{Fortran}:
1948@multitable @columnfractions .20 .80
1949@item @emph{Interface}: @tab @code{integer(c_int) function omp_target_disassociate_ptr(ptr, &}
1950@item @tab @code{ device_num) bind(C)}
1951@item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
1952@item @tab @code{type(c_ptr), value :: ptr}
1953@item @tab @code{integer(c_int), value :: device_num}
1954@end multitable
1955
1956@item @emph{See also}:
1957@ref{omp_target_associate_ptr}
1958
1959@item @emph{Reference}:
1960@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.10
1961@end table
1962
1963
1964
1965@node omp_get_mapped_ptr
1966@subsection @code{omp_get_mapped_ptr} -- Return device pointer to a host pointer
1967@table @asis
1968@item @emph{Description}:
1969If the device number is refers to the initial device or to a device with
1970memory accessible from the host (shared memory), the @code{omp_get_mapped_ptr}
bc238c40 1971routines returns the value of the passed @var{ptr}. Otherwise, if associated
e0786ba6
TB
1972storage to the passed host pointer @var{ptr} exists on device associated with
1973@var{device_num}, it returns that pointer. In all other cases and in cases of
1974an error, a null pointer is returned.
1975
1976The association of storage location is established either via an explicit or
1977implicit @code{map} clause, the @code{declare target} directive or the
1978@code{omp_target_associate_ptr} routine.
1979
1980Running this routine in a @code{target} region except on the initial device
1981is not supported.
1982
1983@item @emph{C/C++}
1984@multitable @columnfractions .20 .80
1985@item @emph{Prototype}: @tab @code{void *omp_get_mapped_ptr(const void *ptr, int device_num);}
1986@end multitable
1987
1988@item @emph{Fortran}:
1989@multitable @columnfractions .20 .80
1990@item @emph{Interface}: @tab @code{type(c_ptr) function omp_get_mapped_ptr(ptr, device_num) bind(C)}
1991@item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
1992@item @tab @code{type(c_ptr), value :: ptr}
1993@item @tab @code{integer(c_int), value :: device_num}
1994@end multitable
1995
1996@item @emph{See also}:
1997@ref{omp_target_associate_ptr}
1998
1999@item @emph{Reference}:
2000@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 18.8.11
2001@end table
2002
2003
506f068e
TB
2004
2005@node Lock Routines
2006@section Lock Routines
2007
2008Initialize, set, test, unset and destroy simple and nested locks.
2009The routines have C linkage and do not throw exceptions.
2010
2011@menu
2012* omp_init_lock:: Initialize simple lock
2013* omp_init_nest_lock:: Initialize nested lock
2014@c * omp_init_lock_with_hint:: <fixme>
2015@c * omp_init_nest_lock_with_hint:: <fixme>
2016* omp_destroy_lock:: Destroy simple lock
2017* omp_destroy_nest_lock:: Destroy nested lock
2018* omp_set_lock:: Wait for and set simple lock
2019* omp_set_nest_lock:: Wait for and set simple lock
2020* omp_unset_lock:: Unset simple lock
2021* omp_unset_nest_lock:: Unset nested lock
2022* omp_test_lock:: Test and set simple lock if available
2023* omp_test_nest_lock:: Test and set nested lock if available
2024@end menu
2025
2026
2027
d77de738 2028@node omp_init_lock
506f068e 2029@subsection @code{omp_init_lock} -- Initialize simple lock
d77de738
ML
2030@table @asis
2031@item @emph{Description}:
2032Initialize a simple lock. After initialization, the lock is in
2033an unlocked state.
2034
2035@item @emph{C/C++}:
2036@multitable @columnfractions .20 .80
2037@item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
2038@end multitable
2039
2040@item @emph{Fortran}:
2041@multitable @columnfractions .20 .80
2042@item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
2043@item @tab @code{integer(omp_lock_kind), intent(out) :: svar}
2044@end multitable
2045
2046@item @emph{See also}:
2047@ref{omp_destroy_lock}
2048
2049@item @emph{Reference}:
2050@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
2051@end table
2052
2053
2054
506f068e
TB
2055@node omp_init_nest_lock
2056@subsection @code{omp_init_nest_lock} -- Initialize nested lock
d77de738
ML
2057@table @asis
2058@item @emph{Description}:
506f068e
TB
2059Initialize a nested lock. After initialization, the lock is in
2060an unlocked state and the nesting count is set to zero.
d77de738
ML
2061
2062@item @emph{C/C++}:
2063@multitable @columnfractions .20 .80
506f068e 2064@item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
d77de738
ML
2065@end multitable
2066
2067@item @emph{Fortran}:
2068@multitable @columnfractions .20 .80
506f068e
TB
2069@item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
2070@item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
d77de738
ML
2071@end multitable
2072
2073@item @emph{See also}:
506f068e 2074@ref{omp_destroy_nest_lock}
d77de738 2075
506f068e
TB
2076@item @emph{Reference}:
2077@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
d77de738
ML
2078@end table
2079
2080
2081
506f068e
TB
2082@node omp_destroy_lock
2083@subsection @code{omp_destroy_lock} -- Destroy simple lock
d77de738
ML
2084@table @asis
2085@item @emph{Description}:
506f068e
TB
2086Destroy a simple lock. In order to be destroyed, a simple lock must be
2087in the unlocked state.
d77de738
ML
2088
2089@item @emph{C/C++}:
2090@multitable @columnfractions .20 .80
506f068e 2091@item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
d77de738
ML
2092@end multitable
2093
2094@item @emph{Fortran}:
2095@multitable @columnfractions .20 .80
506f068e 2096@item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
d77de738
ML
2097@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2098@end multitable
2099
2100@item @emph{See also}:
506f068e 2101@ref{omp_init_lock}
d77de738
ML
2102
2103@item @emph{Reference}:
506f068e 2104@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
d77de738
ML
2105@end table
2106
2107
2108
506f068e
TB
2109@node omp_destroy_nest_lock
2110@subsection @code{omp_destroy_nest_lock} -- Destroy nested lock
d77de738
ML
2111@table @asis
2112@item @emph{Description}:
506f068e
TB
2113Destroy a nested lock. In order to be destroyed, a nested lock must be
2114in the unlocked state and its nesting count must equal zero.
d77de738
ML
2115
2116@item @emph{C/C++}:
2117@multitable @columnfractions .20 .80
506f068e 2118@item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
d77de738
ML
2119@end multitable
2120
2121@item @emph{Fortran}:
2122@multitable @columnfractions .20 .80
506f068e
TB
2123@item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
2124@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
d77de738
ML
2125@end multitable
2126
2127@item @emph{See also}:
506f068e 2128@ref{omp_init_lock}
d77de738
ML
2129
2130@item @emph{Reference}:
506f068e 2131@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
d77de738
ML
2132@end table
2133
2134
2135
506f068e
TB
2136@node omp_set_lock
2137@subsection @code{omp_set_lock} -- Wait for and set simple lock
d77de738
ML
2138@table @asis
2139@item @emph{Description}:
506f068e
TB
2140Before setting a simple lock, the lock variable must be initialized by
2141@code{omp_init_lock}. The calling thread is blocked until the lock
2142is available. If the lock is already held by the current thread,
2143a deadlock occurs.
d77de738
ML
2144
2145@item @emph{C/C++}:
2146@multitable @columnfractions .20 .80
506f068e 2147@item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
d77de738
ML
2148@end multitable
2149
2150@item @emph{Fortran}:
2151@multitable @columnfractions .20 .80
506f068e 2152@item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
d77de738
ML
2153@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2154@end multitable
2155
2156@item @emph{See also}:
506f068e 2157@ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
d77de738
ML
2158
2159@item @emph{Reference}:
506f068e 2160@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
d77de738
ML
2161@end table
2162
2163
2164
d77de738 2165@node omp_set_nest_lock
506f068e 2166@subsection @code{omp_set_nest_lock} -- Wait for and set nested lock
d77de738
ML
2167@table @asis
2168@item @emph{Description}:
2169Before setting a nested lock, the lock variable must be initialized by
2170@code{omp_init_nest_lock}. The calling thread is blocked until the lock
2171is available. If the lock is already held by the current thread, the
2172nesting count for the lock is incremented.
2173
2174@item @emph{C/C++}:
2175@multitable @columnfractions .20 .80
2176@item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
2177@end multitable
2178
2179@item @emph{Fortran}:
2180@multitable @columnfractions .20 .80
2181@item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
2182@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2183@end multitable
2184
2185@item @emph{See also}:
2186@ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
2187
2188@item @emph{Reference}:
2189@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
2190@end table
2191
2192
2193
506f068e
TB
2194@node omp_unset_lock
2195@subsection @code{omp_unset_lock} -- Unset simple lock
d77de738
ML
2196@table @asis
2197@item @emph{Description}:
506f068e
TB
2198A simple lock about to be unset must have been locked by @code{omp_set_lock}
2199or @code{omp_test_lock} before. In addition, the lock must be held by the
2200thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one
2201or more threads attempted to set the lock before, one of them is chosen to,
2202again, set the lock to itself.
d77de738
ML
2203
2204@item @emph{C/C++}:
2205@multitable @columnfractions .20 .80
506f068e 2206@item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
d77de738
ML
2207@end multitable
2208
2209@item @emph{Fortran}:
2210@multitable @columnfractions .20 .80
506f068e
TB
2211@item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
2212@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
d77de738
ML
2213@end multitable
2214
d77de738 2215@item @emph{See also}:
506f068e 2216@ref{omp_set_lock}, @ref{omp_test_lock}
d77de738
ML
2217
2218@item @emph{Reference}:
506f068e 2219@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
d77de738
ML
2220@end table
2221
2222
2223
2224@node omp_unset_nest_lock
506f068e 2225@subsection @code{omp_unset_nest_lock} -- Unset nested lock
d77de738
ML
2226@table @asis
2227@item @emph{Description}:
2228A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
2229or @code{omp_test_nested_lock} before. In addition, the lock must be held by the
2230thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the
2231lock becomes unlocked. If one ore more threads attempted to set the lock before,
2232one of them is chosen to, again, set the lock to itself.
2233
2234@item @emph{C/C++}:
2235@multitable @columnfractions .20 .80
2236@item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
2237@end multitable
2238
2239@item @emph{Fortran}:
2240@multitable @columnfractions .20 .80
2241@item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
2242@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2243@end multitable
2244
2245@item @emph{See also}:
2246@ref{omp_set_nest_lock}
2247
2248@item @emph{Reference}:
2249@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
2250@end table
2251
2252
2253
506f068e
TB
2254@node omp_test_lock
2255@subsection @code{omp_test_lock} -- Test and set simple lock if available
d77de738
ML
2256@table @asis
2257@item @emph{Description}:
506f068e
TB
2258Before setting a simple lock, the lock variable must be initialized by
2259@code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock}
2260does not block if the lock is not available. This function returns
2261@code{true} upon success, @code{false} otherwise. Here, @code{true} and
2262@code{false} represent their language-specific counterparts.
d77de738
ML
2263
2264@item @emph{C/C++}:
2265@multitable @columnfractions .20 .80
506f068e 2266@item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
d77de738
ML
2267@end multitable
2268
2269@item @emph{Fortran}:
2270@multitable @columnfractions .20 .80
506f068e
TB
2271@item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
2272@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2273@end multitable
2274
2275@item @emph{See also}:
2276@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
2277
2278@item @emph{Reference}:
2279@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
2280@end table
2281
2282
2283
2284@node omp_test_nest_lock
2285@subsection @code{omp_test_nest_lock} -- Test and set nested lock if available
2286@table @asis
2287@item @emph{Description}:
2288Before setting a nested lock, the lock variable must be initialized by
2289@code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock},
2290@code{omp_test_nest_lock} does not block if the lock is not available.
2291If the lock is already held by the current thread, the new nesting count
2292is returned. Otherwise, the return value equals zero.
2293
2294@item @emph{C/C++}:
2295@multitable @columnfractions .20 .80
2296@item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
2297@end multitable
2298
2299@item @emph{Fortran}:
2300@multitable @columnfractions .20 .80
2301@item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
d77de738
ML
2302@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2303@end multitable
2304
506f068e 2305
d77de738 2306@item @emph{See also}:
506f068e 2307@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
d77de738
ML
2308
2309@item @emph{Reference}:
506f068e 2310@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
d77de738
ML
2311@end table
2312
2313
2314
506f068e
TB
2315@node Timing Routines
2316@section Timing Routines
2317
2318Portable, thread-based, wall clock timer.
2319The routines have C linkage and do not throw exceptions.
2320
2321@menu
2322* omp_get_wtick:: Get timer precision.
2323* omp_get_wtime:: Elapsed wall clock time.
2324@end menu
2325
2326
2327
d77de738 2328@node omp_get_wtick
506f068e 2329@subsection @code{omp_get_wtick} -- Get timer precision
d77de738
ML
2330@table @asis
2331@item @emph{Description}:
2332Gets the timer precision, i.e., the number of seconds between two
2333successive clock ticks.
2334
2335@item @emph{C/C++}:
2336@multitable @columnfractions .20 .80
2337@item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
2338@end multitable
2339
2340@item @emph{Fortran}:
2341@multitable @columnfractions .20 .80
2342@item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
2343@end multitable
2344
2345@item @emph{See also}:
2346@ref{omp_get_wtime}
2347
2348@item @emph{Reference}:
2349@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
2350@end table
2351
2352
2353
2354@node omp_get_wtime
506f068e 2355@subsection @code{omp_get_wtime} -- Elapsed wall clock time
d77de738
ML
2356@table @asis
2357@item @emph{Description}:
2358Elapsed wall clock time in seconds. The time is measured per thread, no
2359guarantee can be made that two distinct threads measure the same time.
2360Time is measured from some "time in the past", which is an arbitrary time
2361guaranteed not to change during the execution of the program.
2362
2363@item @emph{C/C++}:
2364@multitable @columnfractions .20 .80
2365@item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
2366@end multitable
2367
2368@item @emph{Fortran}:
2369@multitable @columnfractions .20 .80
2370@item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
2371@end multitable
2372
2373@item @emph{See also}:
2374@ref{omp_get_wtick}
2375
2376@item @emph{Reference}:
2377@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
2378@end table
2379
2380
2381
506f068e
TB
2382@node Event Routine
2383@section Event Routine
2384
2385Support for event objects.
2386The routine has C linkage and do not throw exceptions.
2387
2388@menu
2389* omp_fulfill_event:: Fulfill and destroy an OpenMP event.
2390@end menu
2391
2392
2393
d77de738 2394@node omp_fulfill_event
506f068e 2395@subsection @code{omp_fulfill_event} -- Fulfill and destroy an OpenMP event
d77de738
ML
2396@table @asis
2397@item @emph{Description}:
2398Fulfill the event associated with the event handle argument. Currently, it
2399is only used to fulfill events generated by detach clauses on task
2400constructs - the effect of fulfilling the event is to allow the task to
2401complete.
2402
2403The result of calling @code{omp_fulfill_event} with an event handle other
2404than that generated by a detach clause is undefined. Calling it with an
2405event handle that has already been fulfilled is also undefined.
2406
2407@item @emph{C/C++}:
2408@multitable @columnfractions .20 .80
2409@item @emph{Prototype}: @tab @code{void omp_fulfill_event(omp_event_handle_t event);}
2410@end multitable
2411
2412@item @emph{Fortran}:
2413@multitable @columnfractions .20 .80
2414@item @emph{Interface}: @tab @code{subroutine omp_fulfill_event(event)}
2415@item @tab @code{integer (kind=omp_event_handle_kind) :: event}
2416@end multitable
2417
2418@item @emph{Reference}:
2419@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.5.1.
2420@end table
2421
2422
2423
506f068e
TB
2424@c @node Interoperability Routines
2425@c @section Interoperability Routines
2426@c
2427@c Routines to obtain properties from an @code{omp_interop_t} object.
2428@c They have C linkage and do not throw exceptions.
2429@c
2430@c @menu
2431@c * omp_get_num_interop_properties:: <fixme>
2432@c * omp_get_interop_int:: <fixme>
2433@c * omp_get_interop_ptr:: <fixme>
2434@c * omp_get_interop_str:: <fixme>
2435@c * omp_get_interop_name:: <fixme>
2436@c * omp_get_interop_type_desc:: <fixme>
2437@c * omp_get_interop_rc_desc:: <fixme>
2438@c @end menu
2439
971f119f
TB
2440@node Memory Management Routines
2441@section Memory Management Routines
2442
2443Routines to manage and allocate memory on the current device.
2444They have C linkage and do not throw exceptions.
2445
2446@menu
2447* omp_init_allocator:: Create an allocator
2448* omp_destroy_allocator:: Destroy an allocator
2449* omp_set_default_allocator:: Set the default allocator
2450* omp_get_default_allocator:: Get the default allocator
bc238c40
TB
2451* omp_alloc:: Memory allocation with an allocator
2452* omp_aligned_alloc:: Memory allocation with an allocator and alignment
2453* omp_free:: Freeing memory allocated with OpenMP routines
2454* omp_calloc:: Allocate nullified memory with an allocator
2455* omp_aligned_calloc:: Allocate nullified aligned memory with an allocator
2456* omp_realloc:: Reallocate memory allocated with OpenMP routines
506f068e
TB
2457@c * omp_get_memspace_num_resources:: <fixme>/TR11
2458@c * omp_get_submemspace:: <fixme>/TR11
971f119f
TB
2459@end menu
2460
2461
2462
2463@node omp_init_allocator
2464@subsection @code{omp_init_allocator} -- Create an allocator
2465@table @asis
2466@item @emph{Description}:
2467Create an allocator that uses the specified memory space and has the specified
2468traits; if an allocator that fulfills the requirements cannot be created,
2469@code{omp_null_allocator} is returned.
2470
2471The predefined memory spaces and available traits can be found at
2472@ref{OMP_ALLOCATOR}, where the trait names have to be be prefixed by
2473@code{omp_atk_} (e.g. @code{omp_atk_pinned}) and the named trait values by
2474@code{omp_atv_} (e.g. @code{omp_atv_true}); additionally, @code{omp_atv_default}
2475may be used as trait value to specify that the default value should be used.
2476
2477@item @emph{C/C++}:
2478@multitable @columnfractions .20 .80
2479@item @emph{Prototype}: @tab @code{omp_allocator_handle_t omp_init_allocator(}
2480@item @tab @code{ omp_memspace_handle_t memspace,}
2481@item @tab @code{ int ntraits,}
2482@item @tab @code{ const omp_alloctrait_t traits[]);}
2483@end multitable
2484
2485@item @emph{Fortran}:
2486@multitable @columnfractions .20 .80
2487@item @emph{Interface}: @tab @code{function omp_init_allocator(memspace, ntraits, traits)}
bc238c40
TB
2488@item @tab @code{integer (omp_allocator_handle_kind) :: omp_init_allocator}
2489@item @tab @code{integer (omp_memspace_handle_kind), intent(in) :: memspace}
971f119f
TB
2490@item @tab @code{integer, intent(in) :: ntraits}
2491@item @tab @code{type (omp_alloctrait), intent(in) :: traits(*)}
2492@end multitable
2493
2494@item @emph{See also}:
2495@ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_destroy_allocator}
2496
2497@item @emph{Reference}:
2498@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.2
2499@end table
2500
2501
2502
2503@node omp_destroy_allocator
2504@subsection @code{omp_destroy_allocator} -- Destroy an allocator
2505@table @asis
2506@item @emph{Description}:
2507Releases all resources used by a memory allocator, which must not represent
2508a predefined memory allocator. Accessing memory after its allocator has been
2509destroyed has unspecified behavior. Passing @code{omp_null_allocator} to the
15886c03 2510routine is permitted but has no effect.
971f119f
TB
2511
2512
2513@item @emph{C/C++}:
2514@multitable @columnfractions .20 .80
2515@item @emph{Prototype}: @tab @code{void omp_destroy_allocator (omp_allocator_handle_t allocator);}
2516@end multitable
2517
2518@item @emph{Fortran}:
2519@multitable @columnfractions .20 .80
2520@item @emph{Interface}: @tab @code{subroutine omp_destroy_allocator(allocator)}
bc238c40 2521@item @tab @code{integer (omp_allocator_handle_kind), intent(in) :: allocator}
971f119f
TB
2522@end multitable
2523
2524@item @emph{See also}:
2525@ref{omp_init_allocator}
2526
2527@item @emph{Reference}:
2528@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.3
2529@end table
2530
2531
2532
2533@node omp_set_default_allocator
2534@subsection @code{omp_set_default_allocator} -- Set the default allocator
2535@table @asis
2536@item @emph{Description}:
2537Sets the default allocator that is used when no allocator has been specified
2538in the @code{allocate} or @code{allocator} clause or if an OpenMP memory
2539routine is invoked with the @code{omp_null_allocator} allocator.
2540
2541@item @emph{C/C++}:
2542@multitable @columnfractions .20 .80
2543@item @emph{Prototype}: @tab @code{void omp_set_default_allocator(omp_allocator_handle_t allocator);}
2544@end multitable
2545
2546@item @emph{Fortran}:
2547@multitable @columnfractions .20 .80
2548@item @emph{Interface}: @tab @code{subroutine omp_set_default_allocator(allocator)}
bc238c40 2549@item @tab @code{integer (omp_allocator_handle_kind), intent(in) :: allocator}
971f119f
TB
2550@end multitable
2551
2552@item @emph{See also}:
2553@ref{omp_get_default_allocator}, @ref{omp_init_allocator}, @ref{OMP_ALLOCATOR},
2554@ref{Memory allocation}
2555
2556@item @emph{Reference}:
2557@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.4
2558@end table
2559
2560
2561
2562@node omp_get_default_allocator
2563@subsection @code{omp_get_default_allocator} -- Get the default allocator
2564@table @asis
2565@item @emph{Description}:
2566The routine returns the default allocator that is used when no allocator has
2567been specified in the @code{allocate} or @code{allocator} clause or if an
2568OpenMP memory routine is invoked with the @code{omp_null_allocator} allocator.
2569
2570@item @emph{C/C++}:
2571@multitable @columnfractions .20 .80
2572@item @emph{Prototype}: @tab @code{omp_allocator_handle_t omp_get_default_allocator();}
2573@end multitable
2574
2575@item @emph{Fortran}:
2576@multitable @columnfractions .20 .80
2577@item @emph{Interface}: @tab @code{function omp_get_default_allocator()}
bc238c40 2578@item @tab @code{integer (omp_allocator_handle_kind) :: omp_get_default_allocator}
971f119f
TB
2579@end multitable
2580
2581@item @emph{See also}:
2582@ref{omp_set_default_allocator}, @ref{OMP_ALLOCATOR}
2583
2584@item @emph{Reference}:
2585@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.5
2586@end table
2587
2588
506f068e 2589
bc238c40
TB
2590@node omp_alloc
2591@subsection @code{omp_alloc} -- Memory allocation with an allocator
2592@table @asis
2593@item @emph{Description}:
2594Allocate memory with the specified allocator, which can either be a predefined
2595allocator, an allocator handle or @code{omp_null_allocator}. If the allocators
2596is @code{omp_null_allocator}, the allocator specified by the
2597@var{def-allocator-var} ICV is used. @var{size} must be a nonnegative number
2598denoting the number of bytes to be allocated; if @var{size} is zero,
2599@code{omp_alloc} will return a null pointer. If successful, a pointer to the
2600allocated memory is returned, otherwise the @code{fallback} trait of the
2601allocator determines the behavior. The content of the allocated memory is
2602unspecified.
2603
2604In @code{target} regions, either the @code{dynamic_allocators} clause must
2605appear on a @code{requires} directive in the same compilation unit -- or the
2606@var{allocator} argument may only be a constant expression with the value of
2607one of the predefined allocators and may not be @code{omp_null_allocator}.
2608
2609Memory allocated by @code{omp_alloc} must be freed using @code{omp_free}.
2610
2611@item @emph{C}:
2612@multitable @columnfractions .20 .80
2613@item @emph{Prototype}: @tab @code{void* omp_alloc(size_t size,}
2614@item @tab @code{ omp_allocator_handle_t allocator)}
2615@end multitable
2616
2617@item @emph{C++}:
2618@multitable @columnfractions .20 .80
2619@item @emph{Prototype}: @tab @code{void* omp_alloc(size_t size,}
2620@item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2621@end multitable
2622
2623@item @emph{Fortran}:
2624@multitable @columnfractions .20 .80
2625@item @emph{Interface}: @tab @code{type(c_ptr) function omp_alloc(size, allocator) bind(C)}
2626@item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2627@item @tab @code{integer (c_size_t), value :: size}
2628@item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2629@end multitable
2630
2631@item @emph{See also}:
2632@ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2633@ref{omp_free}, @ref{omp_init_allocator}
2634
2635@item @emph{Reference}:
2636@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.6
2637@end table
2638
2639
2640
2641@node omp_aligned_alloc
2642@subsection @code{omp_aligned_alloc} -- Memory allocation with an allocator and alignment
2643@table @asis
2644@item @emph{Description}:
2645Allocate memory with the specified allocator, which can either be a predefined
2646allocator, an allocator handle or @code{omp_null_allocator}. If the allocators
2647is @code{omp_null_allocator}, the allocator specified by the
2648@var{def-allocator-var} ICV is used. @var{alignment} must be a positive power
2649of two and @var{size} must be a nonnegative number that is a multiple of the
2650alignment and denotes the number of bytes to be allocated; if @var{size} is
2651zero, @code{omp_aligned_alloc} will return a null pointer. The alignment will
2652be at least the maximal value required by @code{alignment} trait of the
2653allocator and the value of the passed @var{alignment} argument. If successful,
2654a pointer to the allocated memory is returned, otherwise the @code{fallback}
2655trait of the allocator determines the behavior. The content of the allocated
2656memory is unspecified.
2657
2658In @code{target} regions, either the @code{dynamic_allocators} clause must
2659appear on a @code{requires} directive in the same compilation unit -- or the
2660@var{allocator} argument may only be a constant expression with the value of
2661one of the predefined allocators and may not be @code{omp_null_allocator}.
2662
2663Memory allocated by @code{omp_aligned_alloc} must be freed using
2664@code{omp_free}.
2665
2666@item @emph{C}:
2667@multitable @columnfractions .20 .80
2668@item @emph{Prototype}: @tab @code{void* omp_aligned_alloc(size_t alignment,}
2669@item @tab @code{ size_t size,}
2670@item @tab @code{ omp_allocator_handle_t allocator)}
2671@end multitable
2672
2673@item @emph{C++}:
2674@multitable @columnfractions .20 .80
2675@item @emph{Prototype}: @tab @code{void* omp_aligned_alloc(size_t alignment,}
2676@item @tab @code{ size_t size,}
2677@item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2678@end multitable
2679
2680@item @emph{Fortran}:
2681@multitable @columnfractions .20 .80
2682@item @emph{Interface}: @tab @code{type(c_ptr) function omp_aligned_alloc(alignment, size, allocator) bind(C)}
2683@item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2684@item @tab @code{integer (c_size_t), value :: alignment, size}
2685@item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2686@end multitable
2687
2688@item @emph{See also}:
2689@ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2690@ref{omp_free}, @ref{omp_init_allocator}
2691
2692@item @emph{Reference}:
2693@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.13.6
2694@end table
2695
2696
2697
2698@node omp_free
2699@subsection @code{omp_free} -- Freeing memory allocated with OpenMP routines
2700@table @asis
2701@item @emph{Description}:
2702The @code{omp_free} routine deallocates memory previously allocated by an
2703OpenMP memory-management routine. The @var{ptr} argument must point to such
2704memory or be a null pointer; if it is a null pointer, no operation is
2705performed. If specified, the @var{allocator} argument must be either the
2706memory allocator that was used for the allocation or @code{omp_null_allocator};
2707if it is @code{omp_null_allocator}, the implementation will determine the value
2708automatically.
2709
2710Calling @code{omp_free} invokes undefined behavior if the memory
2711was already deallocated or when the used allocator has already been destroyed.
2712
2713@item @emph{C}:
2714@multitable @columnfractions .20 .80
2715@item @emph{Prototype}: @tab @code{void omp_free(void *ptr,}
2716@item @tab @code{ omp_allocator_handle_t allocator)}
2717@end multitable
2718
2719@item @emph{C++}:
2720@multitable @columnfractions .20 .80
2721@item @emph{Prototype}: @tab @code{void omp_free(void *ptr,}
2722@item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2723@end multitable
2724
2725@item @emph{Fortran}:
2726@multitable @columnfractions .20 .80
2727@item @emph{Interface}: @tab @code{subroutine omp_free(ptr, allocator) bind(C)}
2728@item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr}
2729@item @tab @code{type (c_ptr), value :: ptr}
2730@item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2731@end multitable
2732
2733@item @emph{See also}:
2734@ref{omp_alloc}, @ref{omp_aligned_alloc}, @ref{omp_calloc},
2735@ref{omp_aligned_calloc}, @ref{omp_realloc}
2736
2737@item @emph{Reference}:
2738@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.7
2739@end table
2740
2741
2742
2743@node omp_calloc
2744@subsection @code{omp_calloc} -- Allocate nullified memory with an allocator
2745@table @asis
2746@item @emph{Description}:
2747Allocate zero-initialized memory with the specified allocator, which can either
2748be a predefined allocator, an allocator handle or @code{omp_null_allocator}. If
2749the allocators is @code{omp_null_allocator}, the allocator specified by the
2750@var{def-allocator-var} ICV is used. The to-be allocated memory is for an
2751array with @var{nmemb} elements, each having a size of @var{size} bytes. Both
2752@var{nmemb} and @var{size} must be nonnegative numbers; if either of them is
2753zero, @code{omp_calloc} will return a null pointer. If successful, a pointer to
2754the zero-initialized allocated memory is returned, otherwise the @code{fallback}
2755trait of the allocator determines the behavior.
2756
2757In @code{target} regions, either the @code{dynamic_allocators} clause must
2758appear on a @code{requires} directive in the same compilation unit -- or the
2759@var{allocator} argument may only be a constant expression with the value of
2760one of the predefined allocators and may not be @code{omp_null_allocator}.
2761
2762Memory allocated by @code{omp_calloc} must be freed using @code{omp_free}.
2763
2764@item @emph{C}:
2765@multitable @columnfractions .20 .80
2766@item @emph{Prototype}: @tab @code{void* omp_calloc(size_t nmemb, size_t size,}
2767@item @tab @code{ omp_allocator_handle_t allocator)}
2768@end multitable
2769
2770@item @emph{C++}:
2771@multitable @columnfractions .20 .80
2772@item @emph{Prototype}: @tab @code{void* omp_calloc(size_t nmemb, size_t size,}
2773@item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2774@end multitable
2775
2776@item @emph{Fortran}:
2777@multitable @columnfractions .20 .80
2778@item @emph{Interface}: @tab @code{type(c_ptr) function omp_calloc(nmemb, size, allocator) bind(C)}
2779@item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2780@item @tab @code{integer (c_size_t), value :: nmemb, size}
2781@item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2782@end multitable
2783
2784@item @emph{See also}:
2785@ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2786@ref{omp_free}, @ref{omp_init_allocator}
2787
2788@item @emph{Reference}:
2789@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.13.8
2790@end table
2791
2792
2793
2794@node omp_aligned_calloc
2795@subsection @code{omp_aligned_calloc} -- Allocate aligned nullified memory with an allocator
2796@table @asis
2797@item @emph{Description}:
2798Allocate zero-initialized memory with the specified allocator, which can either
2799be a predefined allocator, an allocator handle or @code{omp_null_allocator}. If
2800the allocators is @code{omp_null_allocator}, the allocator specified by the
2801@var{def-allocator-var} ICV is used. The to-be allocated memory is for an
2802array with @var{nmemb} elements, each having a size of @var{size} bytes. Both
2803@var{nmemb} and @var{size} must be nonnegative numbers; if either of them is
2804zero, @code{omp_aligned_calloc} will return a null pointer. @var{alignment}
2805must be a positive power of two and @var{size} must be a multiple of the
2806alignment; the alignment will be at least the maximal value required by
2807@code{alignment} trait of the allocator and the value of the passed
2808@var{alignment} argument. If successful, a pointer to the zero-initialized
2809allocated memory is returned, otherwise the @code{fallback} trait of the
2810allocator determines the behavior.
2811
2812In @code{target} regions, either the @code{dynamic_allocators} clause must
2813appear on a @code{requires} directive in the same compilation unit -- or the
2814@var{allocator} argument may only be a constant expression with the value of
2815one of the predefined allocators and may not be @code{omp_null_allocator}.
2816
2817Memory allocated by @code{omp_aligned_calloc} must be freed using
2818@code{omp_free}.
2819
2820@item @emph{C}:
2821@multitable @columnfractions .20 .80
2822@item @emph{Prototype}: @tab @code{void* omp_aligned_calloc(size_t nmemb, size_t size,}
2823@item @tab @code{ omp_allocator_handle_t allocator)}
2824@end multitable
2825
2826@item @emph{C++}:
2827@multitable @columnfractions .20 .80
2828@item @emph{Prototype}: @tab @code{void* omp_aligned_calloc(size_t nmemb, size_t size,}
2829@item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
2830@end multitable
2831
2832@item @emph{Fortran}:
2833@multitable @columnfractions .20 .80
2834@item @emph{Interface}: @tab @code{type(c_ptr) function omp_aligned_calloc(nmemb, size, allocator) bind(C)}
2835@item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2836@item @tab @code{integer (c_size_t), value :: nmemb, size}
2837@item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
2838@end multitable
2839
2840@item @emph{See also}:
2841@ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2842@ref{omp_free}, @ref{omp_init_allocator}
2843
2844@item @emph{Reference}:
2845@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.13.8
2846@end table
2847
2848
2849
2850@node omp_realloc
2851@subsection @code{omp_realloc} -- Reallocate memory allocated with OpenMP routines
2852@table @asis
2853@item @emph{Description}:
2854The @code{omp_realloc} routine deallocates memory to which @var{ptr} points to
2855and allocates new memory with the specified @var{allocator} argument; the
2856new memory will have the content of the old memory up to the minimum of the
2857old size and the new @var{size}, otherwise the content of the returned memory
2858is unspecified. If the new allocator is the same as the old one, the routine
2859tries to resize the existing memory allocation, returning the same address as
2860@var{ptr} if successful. @var{ptr} must point to memory allocated by an OpenMP
2861memory-management routine.
2862
2863The @var{allocator} and @var{free_allocator} arguments must be a predefined
2864allocator, an allocator handle or @code{omp_null_allocator}. If
2865@var{free_allocator} is @code{omp_null_allocator}, the implementation
2866automatically determines the allocator used for the allocation of @var{ptr}.
2867If @var{allocator} is @code{omp_null_allocator} and @var{ptr} is is not a
2868null pointer, the same allocator as @code{free_allocator} is used and
2869when @var{ptr} is a null pointer the allocator specified by the
2870@var{def-allocator-var} ICV is used.
2871
2872The @var{size} must be a nonnegative number denoting the number of bytes to be
2873allocated; if @var{size} is zero, @code{omp_realloc} will return free the
2874memory and return a null pointer. When @var{size} is nonzero: if successful,
2875a pointer to the allocated memory is returned, otherwise the @code{fallback}
2876trait of the allocator determines the behavior.
2877
2878In @code{target} regions, either the @code{dynamic_allocators} clause must
2879appear on a @code{requires} directive in the same compilation unit -- or the
2880@var{free_allocator} and @var{allocator} arguments may only be a constant
2881expression with the value of one of the predefined allocators and may not be
2882@code{omp_null_allocator}.
2883
2884Memory allocated by @code{omp_realloc} must be freed using @code{omp_free}.
2885Calling @code{omp_free} invokes undefined behavior if the memory
2886was already deallocated or when the used allocator has already been destroyed.
2887
2888@item @emph{C}:
2889@multitable @columnfractions .20 .80
2890@item @emph{Prototype}: @tab @code{void* omp_realloc(void *ptr, size_t size,}
2891@item @tab @code{ omp_allocator_handle_t allocator,}
2892@item @tab @code{ omp_allocator_handle_t free_allocator)}
2893@end multitable
2894
2895@item @emph{C++}:
2896@multitable @columnfractions .20 .80
2897@item @emph{Prototype}: @tab @code{void* omp_realloc(void *ptr, size_t size,}
2898@item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator,}
2899@item @tab @code{ omp_allocator_handle_t free_allocator=omp_null_allocator)}
2900@end multitable
2901
2902@item @emph{Fortran}:
2903@multitable @columnfractions .20 .80
2904@item @emph{Interface}: @tab @code{type(c_ptr) function omp_realloc(ptr, size, allocator, free_allocator) bind(C)}
2905@item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
2906@item @tab @code{type(C_ptr), value :: ptr}
2907@item @tab @code{integer (c_size_t), value :: size}
2908@item @tab @code{integer (omp_allocator_handle_kind), value :: allocator, free_allocator}
2909@end multitable
2910
2911@item @emph{See also}:
2912@ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
2913@ref{omp_free}, @ref{omp_init_allocator}
2914
2915@item @emph{Reference}:
2916@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.9
2917@end table
2918
2919
2920
506f068e
TB
2921@c @node Tool Control Routine
2922@c
2923@c FIXME
2924
2925@c @node Environment Display Routine
2926@c @section Environment Display Routine
2927@c
2928@c Routine to display the OpenMP number and the initial value of ICVs.
2929@c It has C linkage and do not throw exceptions.
2930@c
2931@c menu
2932@c * omp_display_env:: <fixme>
2933@c end menu
2934
d77de738
ML
2935@c ---------------------------------------------------------------------
2936@c OpenMP Environment Variables
2937@c ---------------------------------------------------------------------
2938
2939@node Environment Variables
2940@chapter OpenMP Environment Variables
2941
2942The environment variables which beginning with @env{OMP_} are defined by
2cd0689a
TB
2943section 4 of the OpenMP specification in version 4.5 or in a later version
2944of the specification, while those beginning with @env{GOMP_} are GNU extensions.
2945Most @env{OMP_} environment variables have an associated internal control
2946variable (ICV).
2947
2948For any OpenMP environment variable that sets an ICV and is neither
2949@code{OMP_DEFAULT_DEVICE} nor has global ICV scope, associated
2950device-specific environment variables exist. For them, the environment
2951variable without suffix affects the host. The suffix @code{_DEV_} followed
2952by a non-negative device number less that the number of available devices sets
2953the ICV for the corresponding device. The suffix @code{_DEV} sets the ICV
2954of all non-host devices for which a device-specific corresponding environment
2955variable has not been set while the @code{_ALL} suffix sets the ICV of all
2956host and non-host devices for which a more specific corresponding environment
2957variable is not set.
d77de738
ML
2958
2959@menu
73a0d3bf
TB
2960* OMP_ALLOCATOR:: Set the default allocator
2961* OMP_AFFINITY_FORMAT:: Set the format string used for affinity display
d77de738 2962* OMP_CANCELLATION:: Set whether cancellation is activated
73a0d3bf 2963* OMP_DISPLAY_AFFINITY:: Display thread affinity information
d77de738
ML
2964* OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
2965* OMP_DEFAULT_DEVICE:: Set the device used in target regions
2966* OMP_DYNAMIC:: Dynamic adjustment of threads
2967* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
2968* OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
2969* OMP_NESTED:: Nested parallel regions
2970* OMP_NUM_TEAMS:: Specifies the number of teams to use by teams region
2971* OMP_NUM_THREADS:: Specifies the number of threads to use
0b9bd33d
JJ
2972* OMP_PROC_BIND:: Whether threads may be moved between CPUs
2973* OMP_PLACES:: Specifies on which CPUs the threads should be placed
d77de738
ML
2974* OMP_STACKSIZE:: Set default thread stack size
2975* OMP_SCHEDULE:: How threads are scheduled
bc238c40 2976* OMP_TARGET_OFFLOAD:: Controls offloading behavior
d77de738
ML
2977* OMP_TEAMS_THREAD_LIMIT:: Set the maximum number of threads imposed by teams
2978* OMP_THREAD_LIMIT:: Set the maximum number of threads
2979* OMP_WAIT_POLICY:: How waiting threads are handled
2980* GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
2981* GOMP_DEBUG:: Enable debugging output
2982* GOMP_STACKSIZE:: Set default thread stack size
2983* GOMP_SPINCOUNT:: Set the busy-wait spin count
2984* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
2985@end menu
2986
2987
73a0d3bf
TB
2988@node OMP_ALLOCATOR
2989@section @env{OMP_ALLOCATOR} -- Set the default allocator
2990@cindex Environment Variable
2991@table @asis
971f119f 2992@item @emph{ICV:} @var{def-allocator-var}
2cd0689a 2993@item @emph{Scope:} data environment
73a0d3bf
TB
2994@item @emph{Description}:
2995Sets the default allocator that is used when no allocator has been specified
2996in the @code{allocate} or @code{allocator} clause or if an OpenMP memory
2997routine is invoked with the @code{omp_null_allocator} allocator.
2998If unset, @code{omp_default_mem_alloc} is used.
2999
3000The value can either be a predefined allocator or a predefined memory space
3001or a predefined memory space followed by a colon and a comma-separated list
3002of memory trait and value pairs, separated by @code{=}.
3003
2cd0689a
TB
3004Note: The corresponding device environment variables are currently not
3005supported. Therefore, the non-host @var{def-allocator-var} ICVs are always
3006initialized to @code{omp_default_mem_alloc}. However, on all devices,
3007the @code{omp_set_default_allocator} API routine can be used to change
3008value.
3009
73a0d3bf 3010@multitable @columnfractions .45 .45
a85a106c 3011@headitem Predefined allocators @tab Associated predefined memory spaces
73a0d3bf
TB
3012@item omp_default_mem_alloc @tab omp_default_mem_space
3013@item omp_large_cap_mem_alloc @tab omp_large_cap_mem_space
3014@item omp_const_mem_alloc @tab omp_const_mem_space
3015@item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
3016@item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
30486fab
AS
3017@item omp_cgroup_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
3018@item omp_pteam_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
3019@item omp_thread_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
73a0d3bf
TB
3020@end multitable
3021
a85a106c
TB
3022The predefined allocators use the default values for the traits,
3023as listed below. Except that the last three allocators have the
3024@code{access} trait set to @code{cgroup}, @code{pteam}, and
3025@code{thread}, respectively.
3026
3027@multitable @columnfractions .25 .40 .25
3028@headitem Trait @tab Allowed values @tab Default value
73a0d3bf
TB
3029@item @code{sync_hint} @tab @code{contended}, @code{uncontended},
3030 @code{serialized}, @code{private}
a85a106c 3031 @tab @code{contended}
73a0d3bf 3032@item @code{alignment} @tab Positive integer being a power of two
a85a106c 3033 @tab 1 byte
73a0d3bf
TB
3034@item @code{access} @tab @code{all}, @code{cgroup},
3035 @code{pteam}, @code{thread}
a85a106c 3036 @tab @code{all}
73a0d3bf 3037@item @code{pool_size} @tab Positive integer
a85a106c 3038 @tab See @ref{Memory allocation}
73a0d3bf
TB
3039@item @code{fallback} @tab @code{default_mem_fb}, @code{null_fb},
3040 @code{abort_fb}, @code{allocator_fb}
a85a106c 3041 @tab See below
73a0d3bf 3042@item @code{fb_data} @tab @emph{unsupported as it needs an allocator handle}
a85a106c 3043 @tab (none)
73a0d3bf 3044@item @code{pinned} @tab @code{true}, @code{false}
a85a106c 3045 @tab @code{false}
73a0d3bf
TB
3046@item @code{partition} @tab @code{environment}, @code{nearest},
3047 @code{blocked}, @code{interleaved}
a85a106c 3048 @tab @code{environment}
73a0d3bf
TB
3049@end multitable
3050
a85a106c
TB
3051For the @code{fallback} trait, the default value is @code{null_fb} for the
3052@code{omp_default_mem_alloc} allocator and any allocator that is associated
3053with device memory; for all other other allocators, it is @code{default_mem_fb}
3054by default.
3055
73a0d3bf
TB
3056Examples:
3057@smallexample
3058OMP_ALLOCATOR=omp_high_bw_mem_alloc
3059OMP_ALLOCATOR=omp_large_cap_mem_space
506f068e 3060OMP_ALLOCATOR=omp_low_lat_mem_space:pinned=true,partition=nearest
73a0d3bf
TB
3061@end smallexample
3062
a85a106c 3063@item @emph{See also}:
971f119f 3064@ref{Memory allocation}, @ref{omp_get_default_allocator},
30486fab 3065@ref{omp_set_default_allocator}, @ref{Offload-Target Specifics}
73a0d3bf
TB
3066
3067@item @emph{Reference}:
3068@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.21
3069@end table
3070
3071
3072
3073@node OMP_AFFINITY_FORMAT
3074@section @env{OMP_AFFINITY_FORMAT} -- Set the format string used for affinity display
3075@cindex Environment Variable
3076@table @asis
2cd0689a
TB
3077@item @emph{ICV:} @var{affinity-format-var}
3078@item @emph{Scope:} device
73a0d3bf
TB
3079@item @emph{Description}:
3080Sets the format string used when displaying OpenMP thread affinity information.
3081Special values are output using @code{%} followed by an optional size
3082specification and then either the single-character field type or its long
15886c03 3083name enclosed in curly braces; using @code{%%} displays a literal percent.
73a0d3bf 3084The size specification consists of an optional @code{0.} or @code{.} followed
450b05ce 3085by a positive integer, specifying the minimal width of the output. With
73a0d3bf
TB
3086@code{0.} and numerical values, the output is padded with zeros on the left;
3087with @code{.}, the output is padded by spaces on the left; otherwise, the
3088output is padded by spaces on the right. If unset, the value is
3089``@code{level %L thread %i affinity %A}''.
3090
3091Supported field types are:
3092
3093@multitable @columnfractions .10 .25 .60
3094@item t @tab team_num @tab value returned by @code{omp_get_team_num}
3095@item T @tab num_teams @tab value returned by @code{omp_get_num_teams}
3096@item L @tab nesting_level @tab value returned by @code{omp_get_level}
3097@item n @tab thread_num @tab value returned by @code{omp_get_thread_num}
3098@item N @tab num_threads @tab value returned by @code{omp_get_num_threads}
3099@item a @tab ancestor_tnum
3100 @tab value returned by
3101 @code{omp_get_ancestor_thread_num(omp_get_level()-1)}
3102@item H @tab host @tab name of the host that executes the thread
450b05ce
TB
3103@item P @tab process_id @tab process identifier
3104@item i @tab native_thread_id @tab native thread identifier
73a0d3bf
TB
3105@item A @tab thread_affinity
3106 @tab comma separated list of integer values or ranges, representing the
3107 processors on which a process might execute, subject to affinity
3108 mechanisms
3109@end multitable
3110
3111For instance, after setting
3112
3113@smallexample
3114OMP_AFFINITY_FORMAT="%0.2a!%n!%.4L!%N;%.2t;%0.2T;%@{team_num@};%@{num_teams@};%A"
3115@end smallexample
3116
3117with either @code{OMP_DISPLAY_AFFINITY} being set or when calling
3118@code{omp_display_affinity} with @code{NULL} or an empty string, the program
3119might display the following:
3120
3121@smallexample
312200!0! 1!4; 0;01;0;1;0-11
312300!3! 1!4; 0;01;0;1;0-11
312400!2! 1!4; 0;01;0;1;0-11
312500!1! 1!4; 0;01;0;1;0-11
3126@end smallexample
3127
3128@item @emph{See also}:
3129@ref{OMP_DISPLAY_AFFINITY}
3130
3131@item @emph{Reference}:
3132@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.14
3133@end table
3134
3135
3136
d77de738
ML
3137@node OMP_CANCELLATION
3138@section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
3139@cindex Environment Variable
3140@table @asis
2cd0689a
TB
3141@item @emph{ICV:} @var{cancel-var}
3142@item @emph{Scope:} global
d77de738
ML
3143@item @emph{Description}:
3144If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or
3145if unset, cancellation is disabled and the @code{cancel} construct is ignored.
3146
3147@item @emph{See also}:
3148@ref{omp_get_cancellation}
3149
3150@item @emph{Reference}:
3151@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
3152@end table
3153
3154
3155
73a0d3bf
TB
3156@node OMP_DISPLAY_AFFINITY
3157@section @env{OMP_DISPLAY_AFFINITY} -- Display thread affinity information
3158@cindex Environment Variable
3159@table @asis
2cd0689a
TB
3160@item @emph{ICV:} @var{display-affinity-var}
3161@item @emph{Scope:} global
73a0d3bf
TB
3162@item @emph{Description}:
3163If set to @code{FALSE} or if unset, affinity displaying is disabled.
15886c03 3164If set to @code{TRUE}, the runtime displays affinity information about
73a0d3bf
TB
3165OpenMP threads in a parallel region upon entering the region and every time
3166any change occurs.
3167
3168@item @emph{See also}:
3169@ref{OMP_AFFINITY_FORMAT}
3170
3171@item @emph{Reference}:
3172@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.13
3173@end table
3174
3175
3176
3177
d77de738
ML
3178@node OMP_DISPLAY_ENV
3179@section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
3180@cindex Environment Variable
3181@table @asis
2cd0689a
TB
3182@item @emph{ICV:} none
3183@item @emph{Scope:} not applicable
d77de738
ML
3184@item @emph{Description}:
3185If set to @code{TRUE}, the OpenMP version number and the values
3186associated with the OpenMP environment variables are printed to @code{stderr}.
3187If set to @code{VERBOSE}, it additionally shows the value of the environment
3188variables which are GNU extensions. If undefined or set to @code{FALSE},
15886c03 3189this information is not shown.
d77de738
ML
3190
3191
3192@item @emph{Reference}:
3193@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
3194@end table
3195
3196
3197
3198@node OMP_DEFAULT_DEVICE
3199@section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
3200@cindex Environment Variable
3201@table @asis
2cd0689a
TB
3202@item @emph{ICV:} @var{default-device-var}
3203@item @emph{Scope:} data environment
d77de738
ML
3204@item @emph{Description}:
3205Set to choose the device which is used in a @code{target} region, unless the
3206value is overridden by @code{omp_set_default_device} or by a @code{device}
3207clause. The value shall be the nonnegative device number. If no device with
3208the given device number exists, the code is executed on the host. If unset,
18c8b56c
TB
3209@env{OMP_TARGET_OFFLOAD} is @code{mandatory} and no non-host devices are
3210available, it is set to @code{omp_invalid_device}. Otherwise, if unset,
15886c03 3211device number 0 is used.
d77de738
ML
3212
3213
3214@item @emph{See also}:
3215@ref{omp_get_default_device}, @ref{omp_set_default_device},
8bd11fa4 3216@ref{OMP_TARGET_OFFLOAD}
d77de738
ML
3217
3218@item @emph{Reference}:
8bd11fa4 3219@uref{https://www.openmp.org, OpenMP specification v5.2}, Section 21.2.7
d77de738
ML
3220@end table
3221
3222
3223
3224@node OMP_DYNAMIC
3225@section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
3226@cindex Environment Variable
3227@table @asis
2cd0689a
TB
3228@item @emph{ICV:} @var{dyn-var}
3229@item @emph{Scope:} global
d77de738
ML
3230@item @emph{Description}:
3231Enable or disable the dynamic adjustment of the number of threads
3232within a team. The value of this environment variable shall be
3233@code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is
3234disabled by default.
3235
3236@item @emph{See also}:
3237@ref{omp_set_dynamic}
3238
3239@item @emph{Reference}:
3240@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
3241@end table
3242
3243
3244
3245@node OMP_MAX_ACTIVE_LEVELS
3246@section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
3247@cindex Environment Variable
3248@table @asis
2cd0689a
TB
3249@item @emph{ICV:} @var{max-active-levels-var}
3250@item @emph{Scope:} data environment
d77de738
ML
3251@item @emph{Description}:
3252Specifies the initial value for the maximum number of nested parallel
3253regions. The value of this variable shall be a positive integer.
3254If undefined, then if @env{OMP_NESTED} is defined and set to true, or
3255if @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND} are defined and set to
3256a list with more than one item, the maximum number of nested parallel
15886c03
TB
3257regions is initialized to the largest number supported, otherwise
3258it is set to one.
d77de738
ML
3259
3260@item @emph{See also}:
2cd0689a
TB
3261@ref{omp_set_max_active_levels}, @ref{OMP_NESTED}, @ref{OMP_PROC_BIND},
3262@ref{OMP_NUM_THREADS}
3263
d77de738
ML
3264
3265@item @emph{Reference}:
3266@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
3267@end table
3268
3269
3270
3271@node OMP_MAX_TASK_PRIORITY
3272@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
3273number that can be set for a task.
3274@cindex Environment Variable
3275@table @asis
2cd0689a
TB
3276@item @emph{ICV:} @var{max-task-priority-var}
3277@item @emph{Scope:} global
d77de738
ML
3278@item @emph{Description}:
3279Specifies the initial value for the maximum priority value that can be
3280set for a task. The value of this variable shall be a non-negative
3281integer, and zero is allowed. If undefined, the default priority is
32820.
3283
3284@item @emph{See also}:
3285@ref{omp_get_max_task_priority}
3286
3287@item @emph{Reference}:
3288@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
3289@end table
3290
3291
3292
3293@node OMP_NESTED
3294@section @env{OMP_NESTED} -- Nested parallel regions
3295@cindex Environment Variable
3296@cindex Implementation specific setting
3297@table @asis
2cd0689a
TB
3298@item @emph{ICV:} @var{max-active-levels-var}
3299@item @emph{Scope:} data environment
d77de738
ML
3300@item @emph{Description}:
3301Enable or disable nested parallel regions, i.e., whether team members
3302are allowed to create new teams. The value of this environment variable
3303shall be @code{TRUE} or @code{FALSE}. If set to @code{TRUE}, the number
15886c03
TB
3304of maximum active nested regions supported is by default set to the
3305maximum supported, otherwise it is set to one. If
3306@env{OMP_MAX_ACTIVE_LEVELS} is defined, its setting overrides this
d77de738
ML
3307setting. If both are undefined, nested parallel regions are enabled if
3308@env{OMP_NUM_THREADS} or @env{OMP_PROC_BINDS} are defined to a list with
3309more than one item, otherwise they are disabled by default.
3310
2cd0689a
TB
3311Note that the @code{OMP_NESTED} environment variable was deprecated in
3312the OpenMP specification 5.2 in favor of @code{OMP_MAX_ACTIVE_LEVELS}.
3313
d77de738 3314@item @emph{See also}:
2cd0689a
TB
3315@ref{omp_set_max_active_levels}, @ref{omp_set_nested},
3316@ref{OMP_MAX_ACTIVE_LEVELS}
d77de738
ML
3317
3318@item @emph{Reference}:
3319@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
3320@end table
3321
3322
3323
3324@node OMP_NUM_TEAMS
3325@section @env{OMP_NUM_TEAMS} -- Specifies the number of teams to use by teams region
3326@cindex Environment Variable
3327@table @asis
2cd0689a
TB
3328@item @emph{ICV:} @var{nteams-var}
3329@item @emph{Scope:} device
d77de738
ML
3330@item @emph{Description}:
3331Specifies the upper bound for number of teams to use in teams regions
3332without explicit @code{num_teams} clause. The value of this variable shall
3333be a positive integer. If undefined it defaults to 0 which means
3334implementation defined upper bound.
3335
3336@item @emph{See also}:
3337@ref{omp_set_num_teams}
3338
3339@item @emph{Reference}:
3340@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.23
3341@end table
3342
3343
3344
3345@node OMP_NUM_THREADS
3346@section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
3347@cindex Environment Variable
3348@cindex Implementation specific setting
3349@table @asis
2cd0689a
TB
3350@item @emph{ICV:} @var{nthreads-var}
3351@item @emph{Scope:} data environment
d77de738
ML
3352@item @emph{Description}:
3353Specifies the default number of threads to use in parallel regions. The
3354value of this variable shall be a comma-separated list of positive integers;
3355the value specifies the number of threads to use for the corresponding nested
15886c03 3356level. Specifying more than one item in the list automatically enables
d77de738
ML
3357nesting by default. If undefined one thread per CPU is used.
3358
2cd0689a
TB
3359When a list with more than value is specified, it also affects the
3360@var{max-active-levels-var} ICV as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
3361
d77de738 3362@item @emph{See also}:
2cd0689a 3363@ref{omp_set_num_threads}, @ref{OMP_MAX_ACTIVE_LEVELS}
d77de738
ML
3364
3365@item @emph{Reference}:
3366@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
3367@end table
3368
3369
3370
3371@node OMP_PROC_BIND
0b9bd33d 3372@section @env{OMP_PROC_BIND} -- Whether threads may be moved between CPUs
d77de738
ML
3373@cindex Environment Variable
3374@table @asis
2cd0689a
TB
3375@item @emph{ICV:} @var{bind-var}
3376@item @emph{Scope:} data environment
d77de738
ML
3377@item @emph{Description}:
3378Specifies whether threads may be moved between processors. If set to
0b9bd33d 3379@code{TRUE}, OpenMP threads should not be moved; if set to @code{FALSE}
d77de738
ML
3380they may be moved. Alternatively, a comma separated list with the
3381values @code{PRIMARY}, @code{MASTER}, @code{CLOSE} and @code{SPREAD} can
3382be used to specify the thread affinity policy for the corresponding nesting
3383level. With @code{PRIMARY} and @code{MASTER} the worker threads are in the
3384same place partition as the primary thread. With @code{CLOSE} those are
3385kept close to the primary thread in contiguous place partitions. And
3386with @code{SPREAD} a sparse distribution
3387across the place partitions is used. Specifying more than one item in the
15886c03 3388list automatically enables nesting by default.
d77de738 3389
2cd0689a
TB
3390When a list is specified, it also affects the @var{max-active-levels-var} ICV
3391as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
3392
d77de738
ML
3393When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
3394@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
3395
3396@item @emph{See also}:
2cd0689a
TB
3397@ref{omp_get_proc_bind}, @ref{GOMP_CPU_AFFINITY}, @ref{OMP_PLACES},
3398@ref{OMP_MAX_ACTIVE_LEVELS}
d77de738
ML
3399
3400@item @emph{Reference}:
3401@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
3402@end table
3403
3404
3405
3406@node OMP_PLACES
0b9bd33d 3407@section @env{OMP_PLACES} -- Specifies on which CPUs the threads should be placed
d77de738
ML
3408@cindex Environment Variable
3409@table @asis
2cd0689a
TB
3410@item @emph{ICV:} @var{place-partition-var}
3411@item @emph{Scope:} implicit tasks
d77de738
ML
3412@item @emph{Description}:
3413The thread placement can be either specified using an abstract name or by an
3414explicit list of the places. The abstract names @code{threads}, @code{cores},
3415@code{sockets}, @code{ll_caches} and @code{numa_domains} can be optionally
3416followed by a positive number in parentheses, which denotes the how many places
3417shall be created. With @code{threads} each place corresponds to a single
3418hardware thread; @code{cores} to a single core with the corresponding number of
3419hardware threads; with @code{sockets} the place corresponds to a single
3420socket; with @code{ll_caches} to a set of cores that shares the last level
3421cache on the device; and @code{numa_domains} to a set of cores for which their
3422closest memory on the device is the same memory and at a similar distance from
3423the cores. The resulting placement can be shown by setting the
3424@env{OMP_DISPLAY_ENV} environment variable.
3425
3426Alternatively, the placement can be specified explicitly as comma-separated
3427list of places. A place is specified by set of nonnegative numbers in curly
3428braces, denoting the hardware threads. The curly braces can be omitted
3429when only a single number has been specified. The hardware threads
3430belonging to a place can either be specified as comma-separated list of
3431nonnegative thread numbers or using an interval. Multiple places can also be
3432either specified by a comma-separated list of places or by an interval. To
3433specify an interval, a colon followed by the count is placed after
3434the hardware thread number or the place. Optionally, the length can be
3435followed by a colon and the stride number -- otherwise a unit stride is
3436assumed. Placing an exclamation mark (@code{!}) directly before a curly
15886c03
TB
3437brace or numbers inside the curly braces (excluding intervals)
3438excludes those hardware threads.
d77de738
ML
3439
3440For instance, the following specifies the same places list:
3441@code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
3442@code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
3443
3444If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
3445@env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
3446between CPUs following no placement policy.
3447
3448@item @emph{See also}:
3449@ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
3450@ref{OMP_DISPLAY_ENV}
3451
3452@item @emph{Reference}:
3453@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
3454@end table
3455
3456
3457
3458@node OMP_STACKSIZE
3459@section @env{OMP_STACKSIZE} -- Set default thread stack size
3460@cindex Environment Variable
3461@table @asis
2cd0689a
TB
3462@item @emph{ICV:} @var{stacksize-var}
3463@item @emph{Scope:} device
d77de738
ML
3464@item @emph{Description}:
3465Set the default thread stack size in kilobytes, unless the number
3466is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
3467case the size is, respectively, in bytes, kilobytes, megabytes
3468or gigabytes. This is different from @code{pthread_attr_setstacksize}
3469which gets the number of bytes as an argument. If the stack size cannot
3470be set due to system constraints, an error is reported and the initial
3471stack size is left unchanged. If undefined, the stack size is system
3472dependent.
3473
2cd0689a
TB
3474@item @emph{See also}:
3475@ref{GOMP_STACKSIZE}
3476
d77de738
ML
3477@item @emph{Reference}:
3478@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
3479@end table
3480
3481
3482
3483@node OMP_SCHEDULE
3484@section @env{OMP_SCHEDULE} -- How threads are scheduled
3485@cindex Environment Variable
3486@cindex Implementation specific setting
3487@table @asis
2cd0689a
TB
3488@item @emph{ICV:} @var{run-sched-var}
3489@item @emph{Scope:} data environment
d77de738
ML
3490@item @emph{Description}:
3491Allows to specify @code{schedule type} and @code{chunk size}.
3492The value of the variable shall have the form: @code{type[,chunk]} where
3493@code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
3494The optional @code{chunk} size shall be a positive integer. If undefined,
3495dynamic scheduling and a chunk size of 1 is used.
3496
3497@item @emph{See also}:
3498@ref{omp_set_schedule}
3499
3500@item @emph{Reference}:
3501@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
3502@end table
3503
3504
3505
3506@node OMP_TARGET_OFFLOAD
bc238c40 3507@section @env{OMP_TARGET_OFFLOAD} -- Controls offloading behavior
d77de738
ML
3508@cindex Environment Variable
3509@cindex Implementation specific setting
3510@table @asis
2cd0689a
TB
3511@item @emph{ICV:} @var{target-offload-var}
3512@item @emph{Scope:} global
d77de738 3513@item @emph{Description}:
bc238c40 3514Specifies the behavior with regard to offloading code to a device. This
d77de738
ML
3515variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED}
3516or @code{DEFAULT}.
3517
15886c03 3518If set to @code{MANDATORY}, the program terminates with an error if
8bd11fa4
TB
3519any device construct or device memory routine uses a device that is unavailable
3520or not supported by the implementation, or uses a non-conforming device number.
15886c03
TB
3521If set to @code{DISABLED}, then offloading is disabled and all code runs on
3522the host. If set to @code{DEFAULT}, the program tries offloading to the
3523device first, then falls back to running code on the host if it cannot.
d77de738 3524
15886c03 3525If undefined, then the program behaves as if @code{DEFAULT} was set.
d77de738 3526
15886c03 3527Note: Even with @code{MANDATORY}, no run-time termination is performed when
8bd11fa4
TB
3528the device number in a @code{device} clause or argument to a device memory
3529routine is for host, which includes using the device number in the
3530@var{default-device-var} ICV. However, the initial value of
3531the @var{default-device-var} ICV is affected by @code{MANDATORY}.
3532
3533@item @emph{See also}:
3534@ref{OMP_DEFAULT_DEVICE}
3535
d77de738 3536@item @emph{Reference}:
8bd11fa4 3537@uref{https://www.openmp.org, OpenMP specification v5.2}, Section 21.2.8
d77de738
ML
3538@end table
3539
3540
3541
3542@node OMP_TEAMS_THREAD_LIMIT
3543@section @env{OMP_TEAMS_THREAD_LIMIT} -- Set the maximum number of threads imposed by teams
3544@cindex Environment Variable
3545@table @asis
2cd0689a
TB
3546@item @emph{ICV:} @var{teams-thread-limit-var}
3547@item @emph{Scope:} device
d77de738
ML
3548@item @emph{Description}:
3549Specifies an upper bound for the number of threads to use by each contention
3550group created by a teams construct without explicit @code{thread_limit}
3551clause. The value of this variable shall be a positive integer. If undefined,
3552the value of 0 is used which stands for an implementation defined upper
3553limit.
3554
3555@item @emph{See also}:
3556@ref{OMP_THREAD_LIMIT}, @ref{omp_set_teams_thread_limit}
3557
3558@item @emph{Reference}:
3559@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.24
3560@end table
3561
3562
3563
3564@node OMP_THREAD_LIMIT
3565@section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
3566@cindex Environment Variable
3567@table @asis
2cd0689a
TB
3568@item @emph{ICV:} @var{thread-limit-var}
3569@item @emph{Scope:} data environment
d77de738
ML
3570@item @emph{Description}:
3571Specifies the number of threads to use for the whole program. The
3572value of this variable shall be a positive integer. If undefined,
3573the number of threads is not limited.
3574
3575@item @emph{See also}:
3576@ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
3577
3578@item @emph{Reference}:
3579@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
3580@end table
3581
3582
3583
3584@node OMP_WAIT_POLICY
3585@section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
3586@cindex Environment Variable
3587@table @asis
3588@item @emph{Description}:
3589Specifies whether waiting threads should be active or passive. If
3590the value is @code{PASSIVE}, waiting threads should not consume CPU
3591power while waiting; while the value is @code{ACTIVE} specifies that
3592they should. If undefined, threads wait actively for a short time
3593before waiting passively.
3594
3595@item @emph{See also}:
3596@ref{GOMP_SPINCOUNT}
3597
3598@item @emph{Reference}:
3599@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
3600@end table
3601
3602
3603
3604@node GOMP_CPU_AFFINITY
3605@section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
3606@cindex Environment Variable
3607@table @asis
3608@item @emph{Description}:
3609Binds threads to specific CPUs. The variable should contain a space-separated
3610or comma-separated list of CPUs. This list may contain different kinds of
3611entries: either single CPU numbers in any order, a range of CPUs (M-N)
3612or a range with some stride (M-N:S). CPU numbers are zero based. For example,
15886c03 3613@code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} binds the initial thread
d77de738
ML
3614to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
3615CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
15886c03 3616and 14 respectively and then starts assigning back from the beginning of
d77de738
ML
3617the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
3618
3619There is no libgomp library routine to determine whether a CPU affinity
3620specification is in effect. As a workaround, language-specific library
3621functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
3622Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
3623environment variable. A defined CPU affinity on startup cannot be changed
3624or disabled during the runtime of the application.
3625
3626If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
3627@env{OMP_PROC_BIND} has a higher precedence. If neither has been set and
3628@env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
15886c03 3629@code{FALSE}, the host system handles the assignment of threads to CPUs.
d77de738
ML
3630
3631@item @emph{See also}:
3632@ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
3633@end table
3634
3635
3636
3637@node GOMP_DEBUG
3638@section @env{GOMP_DEBUG} -- Enable debugging output
3639@cindex Environment Variable
3640@table @asis
3641@item @emph{Description}:
3642Enable debugging output. The variable should be set to @code{0}
3643(disabled, also the default if not set), or @code{1} (enabled).
3644
15886c03 3645If enabled, some debugging output is printed during execution.
d77de738
ML
3646This is currently not specified in more detail, and subject to change.
3647@end table
3648
3649
3650
3651@node GOMP_STACKSIZE
3652@section @env{GOMP_STACKSIZE} -- Set default thread stack size
3653@cindex Environment Variable
3654@cindex Implementation specific setting
3655@table @asis
3656@item @emph{Description}:
3657Set the default thread stack size in kilobytes. This is different from
3658@code{pthread_attr_setstacksize} which gets the number of bytes as an
3659argument. If the stack size cannot be set due to system constraints, an
3660error is reported and the initial stack size is left unchanged. If undefined,
3661the stack size is system dependent.
3662
3663@item @emph{See also}:
3664@ref{OMP_STACKSIZE}
3665
3666@item @emph{Reference}:
3667@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
3668GCC Patches Mailinglist},
3669@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
3670GCC Patches Mailinglist}
3671@end table
3672
3673
3674
3675@node GOMP_SPINCOUNT
3676@section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
3677@cindex Environment Variable
3678@cindex Implementation specific setting
3679@table @asis
3680@item @emph{Description}:
3681Determines how long a threads waits actively with consuming CPU power
3682before waiting passively without consuming CPU power. The value may be
3683either @code{INFINITE}, @code{INFINITY} to always wait actively or an
3684integer which gives the number of spins of the busy-wait loop. The
3685integer may optionally be followed by the following suffixes acting
3686as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
3687million), @code{G} (giga, billion), or @code{T} (tera, trillion).
3688If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
3689300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
369030 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
3691If there are more OpenMP threads than available CPUs, 1000 and 100
3692spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
3693undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
3694or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
3695
3696@item @emph{See also}:
3697@ref{OMP_WAIT_POLICY}
3698@end table
3699
3700
3701
3702@node GOMP_RTEMS_THREAD_POOLS
3703@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
3704@cindex Environment Variable
3705@cindex Implementation specific setting
3706@table @asis
3707@item @emph{Description}:
3708This environment variable is only used on the RTEMS real-time operating system.
3709It determines the scheduler instance specific thread pools. The format for
3710@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
3711@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
3712separated by @code{:} where:
3713@itemize @bullet
3714@item @code{<thread-pool-count>} is the thread pool count for this scheduler
3715instance.
3716@item @code{$<priority>} is an optional priority for the worker threads of a
3717thread pool according to @code{pthread_setschedparam}. In case a priority
15886c03 3718value is omitted, then a worker thread inherits the priority of the OpenMP
d77de738
ML
3719primary thread that created it. The priority of the worker thread is not
3720changed after creation, even if a new OpenMP primary thread using the worker has
3721a different priority.
3722@item @code{@@<scheduler-name>} is the scheduler instance name according to the
3723RTEMS application configuration.
3724@end itemize
3725In case no thread pool configuration is specified for a scheduler instance,
15886c03 3726then each OpenMP primary thread of this scheduler instance uses its own
d77de738
ML
3727dynamically allocated thread pool. To limit the worker thread count of the
3728thread pools, each OpenMP primary thread must call @code{omp_set_num_threads}.
3729@item @emph{Example}:
3730Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
3731@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
3732@code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
3733scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
3734one thread pool available. Since no priority is specified for this scheduler
3735instance, the worker thread inherits the priority of the OpenMP primary thread
3736that created it. In the scheduler instance @code{WRK1} there are three thread
3737pools available and their worker threads run at priority four.
3738@end table
3739
3740
3741
3742@c ---------------------------------------------------------------------
3743@c Enabling OpenACC
3744@c ---------------------------------------------------------------------
3745
3746@node Enabling OpenACC
3747@chapter Enabling OpenACC
3748
3749To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
3750flag @option{-fopenacc} must be specified. This enables the OpenACC directive
643a5223
TB
3751@samp{#pragma acc} in C/C++ and, in Fortran, the @samp{!$acc} sentinel in free
3752source form and the @samp{c$acc}, @samp{*$acc} and @samp{!$acc} sentinels in
3753fixed source form. The flag also arranges for automatic linking of the OpenACC
3754runtime library (@ref{OpenACC Runtime Library Routines}).
d77de738
ML
3755
3756See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
3757
3758A complete description of all OpenACC directives accepted may be found in
3759the @uref{https://www.openacc.org, OpenACC} Application Programming
3760Interface manual, version 2.6.
3761
3762
3763
3764@c ---------------------------------------------------------------------
3765@c OpenACC Runtime Library Routines
3766@c ---------------------------------------------------------------------
3767
3768@node OpenACC Runtime Library Routines
3769@chapter OpenACC Runtime Library Routines
3770
3771The runtime routines described here are defined by section 3 of the OpenACC
3772specifications in version 2.6.
3773They have C linkage, and do not throw exceptions.
3774Generally, they are available only for the host, with the exception of
3775@code{acc_on_device}, which is available for both the host and the
3776acceleration device.
3777
3778@menu
3779* acc_get_num_devices:: Get number of devices for the given device
3780 type.
3781* acc_set_device_type:: Set type of device accelerator to use.
3782* acc_get_device_type:: Get type of device accelerator to be used.
3783* acc_set_device_num:: Set device number to use.
3784* acc_get_device_num:: Get device number to be used.
3785* acc_get_property:: Get device property.
3786* acc_async_test:: Tests for completion of a specific asynchronous
3787 operation.
3788* acc_async_test_all:: Tests for completion of all asynchronous
3789 operations.
3790* acc_wait:: Wait for completion of a specific asynchronous
3791 operation.
3792* acc_wait_all:: Waits for completion of all asynchronous
3793 operations.
3794* acc_wait_all_async:: Wait for completion of all asynchronous
3795 operations.
3796* acc_wait_async:: Wait for completion of asynchronous operations.
3797* acc_init:: Initialize runtime for a specific device type.
3798* acc_shutdown:: Shuts down the runtime for a specific device
3799 type.
3800* acc_on_device:: Whether executing on a particular device
3801* acc_malloc:: Allocate device memory.
3802* acc_free:: Free device memory.
3803* acc_copyin:: Allocate device memory and copy host memory to
3804 it.
3805* acc_present_or_copyin:: If the data is not present on the device,
3806 allocate device memory and copy from host
3807 memory.
3808* acc_create:: Allocate device memory and map it to host
3809 memory.
3810* acc_present_or_create:: If the data is not present on the device,
3811 allocate device memory and map it to host
3812 memory.
3813* acc_copyout:: Copy device memory to host memory.
3814* acc_delete:: Free device memory.
3815* acc_update_device:: Update device memory from mapped host memory.
3816* acc_update_self:: Update host memory from mapped device memory.
3817* acc_map_data:: Map previously allocated device memory to host
3818 memory.
3819* acc_unmap_data:: Unmap device memory from host memory.
3820* acc_deviceptr:: Get device pointer associated with specific
3821 host address.
3822* acc_hostptr:: Get host pointer associated with specific
3823 device address.
3824* acc_is_present:: Indicate whether host variable / array is
3825 present on device.
3826* acc_memcpy_to_device:: Copy host memory to device memory.
3827* acc_memcpy_from_device:: Copy device memory to host memory.
3828* acc_attach:: Let device pointer point to device-pointer target.
3829* acc_detach:: Let device pointer point to host-pointer target.
3830
3831API routines for target platforms.
3832
3833* acc_get_current_cuda_device:: Get CUDA device handle.
3834* acc_get_current_cuda_context::Get CUDA context handle.
3835* acc_get_cuda_stream:: Get CUDA stream handle.
3836* acc_set_cuda_stream:: Set CUDA stream handle.
3837
3838API routines for the OpenACC Profiling Interface.
3839
3840* acc_prof_register:: Register callbacks.
3841* acc_prof_unregister:: Unregister callbacks.
3842* acc_prof_lookup:: Obtain inquiry functions.
3843* acc_register_library:: Library registration.
3844@end menu
3845
3846
3847
3848@node acc_get_num_devices
3849@section @code{acc_get_num_devices} -- Get number of devices for given device type
3850@table @asis
3851@item @emph{Description}
3852This function returns a value indicating the number of devices available
3853for the device type specified in @var{devicetype}.
3854
3855@item @emph{C/C++}:
3856@multitable @columnfractions .20 .80
3857@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
3858@end multitable
3859
3860@item @emph{Fortran}:
3861@multitable @columnfractions .20 .80
3862@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
3863@item @tab @code{integer(kind=acc_device_kind) devicetype}
3864@end multitable
3865
3866@item @emph{Reference}:
3867@uref{https://www.openacc.org, OpenACC specification v2.6}, section
38683.2.1.
3869@end table
3870
3871
3872
3873@node acc_set_device_type
3874@section @code{acc_set_device_type} -- Set type of device accelerator to use.
3875@table @asis
3876@item @emph{Description}
3877This function indicates to the runtime library which device type, specified
3878in @var{devicetype}, to use when executing a parallel or kernels region.
3879
3880@item @emph{C/C++}:
3881@multitable @columnfractions .20 .80
3882@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
3883@end multitable
3884
3885@item @emph{Fortran}:
3886@multitable @columnfractions .20 .80
3887@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
3888@item @tab @code{integer(kind=acc_device_kind) devicetype}
3889@end multitable
3890
3891@item @emph{Reference}:
3892@uref{https://www.openacc.org, OpenACC specification v2.6}, section
38933.2.2.
3894@end table
3895
3896
3897
3898@node acc_get_device_type
3899@section @code{acc_get_device_type} -- Get type of device accelerator to be used.
3900@table @asis
3901@item @emph{Description}
3902This function returns what device type will be used when executing a
3903parallel or kernels region.
3904
3905This function returns @code{acc_device_none} if
3906@code{acc_get_device_type} is called from
3907@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
3908callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling
3909Interface}), that is, if the device is currently being initialized.
3910
3911@item @emph{C/C++}:
3912@multitable @columnfractions .20 .80
3913@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
3914@end multitable
3915
3916@item @emph{Fortran}:
3917@multitable @columnfractions .20 .80
3918@item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
3919@item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
3920@end multitable
3921
3922@item @emph{Reference}:
3923@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39243.2.3.
3925@end table
3926
3927
3928
3929@node acc_set_device_num
3930@section @code{acc_set_device_num} -- Set device number to use.
3931@table @asis
3932@item @emph{Description}
3933This function will indicate to the runtime which device number,
3934specified by @var{devicenum}, associated with the specified device
3935type @var{devicetype}.
3936
3937@item @emph{C/C++}:
3938@multitable @columnfractions .20 .80
3939@item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);}
3940@end multitable
3941
3942@item @emph{Fortran}:
3943@multitable @columnfractions .20 .80
3944@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
3945@item @tab @code{integer devicenum}
3946@item @tab @code{integer(kind=acc_device_kind) devicetype}
3947@end multitable
3948
3949@item @emph{Reference}:
3950@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39513.2.4.
3952@end table
3953
3954
3955
3956@node acc_get_device_num
3957@section @code{acc_get_device_num} -- Get device number to be used.
3958@table @asis
3959@item @emph{Description}
3960This function returns which device number associated with the specified device
3961type @var{devicetype}, will be used when executing a parallel or kernels
3962region.
3963
3964@item @emph{C/C++}:
3965@multitable @columnfractions .20 .80
3966@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
3967@end multitable
3968
3969@item @emph{Fortran}:
3970@multitable @columnfractions .20 .80
3971@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
3972@item @tab @code{integer(kind=acc_device_kind) devicetype}
3973@item @tab @code{integer acc_get_device_num}
3974@end multitable
3975
3976@item @emph{Reference}:
3977@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39783.2.5.
3979@end table
3980
3981
3982
3983@node acc_get_property
3984@section @code{acc_get_property} -- Get device property.
3985@cindex acc_get_property
3986@cindex acc_get_property_string
3987@table @asis
3988@item @emph{Description}
3989These routines return the value of the specified @var{property} for the
3990device being queried according to @var{devicenum} and @var{devicetype}.
3991Integer-valued and string-valued properties are returned by
3992@code{acc_get_property} and @code{acc_get_property_string} respectively.
3993The Fortran @code{acc_get_property_string} subroutine returns the string
3994retrieved in its fourth argument while the remaining entry points are
3995functions, which pass the return value as their result.
3996
3997Note for Fortran, only: the OpenACC technical committee corrected and, hence,
3998modified the interface introduced in OpenACC 2.6. The kind-value parameter
3999@code{acc_device_property} has been renamed to @code{acc_device_property_kind}
4000for consistency and the return type of the @code{acc_get_property} function is
4001now a @code{c_size_t} integer instead of a @code{acc_device_property} integer.
15886c03 4002The parameter @code{acc_device_property} is still provided,
d77de738
ML
4003but might be removed in a future version of GCC.
4004
4005@item @emph{C/C++}:
4006@multitable @columnfractions .20 .80
4007@item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
4008@item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
4009@end multitable
4010
4011@item @emph{Fortran}:
4012@multitable @columnfractions .20 .80
4013@item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)}
4014@item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)}
4015@item @tab @code{use ISO_C_Binding, only: c_size_t}
4016@item @tab @code{integer devicenum}
4017@item @tab @code{integer(kind=acc_device_kind) devicetype}
4018@item @tab @code{integer(kind=acc_device_property_kind) property}
4019@item @tab @code{integer(kind=c_size_t) acc_get_property}
4020@item @tab @code{character(*) string}
4021@end multitable
4022
4023@item @emph{Reference}:
4024@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40253.2.6.
4026@end table
4027
4028
4029
4030@node acc_async_test
4031@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
4032@table @asis
4033@item @emph{Description}
4034This function tests for completion of the asynchronous operation specified
15886c03
TB
4035in @var{arg}. In C/C++, a non-zero value is returned to indicate
4036the specified asynchronous operation has completed while Fortran returns
4037@code{true}. If the asynchronous operation has not completed, C/C++ returns
4038zero and Fortran returns @code{false}.
d77de738
ML
4039
4040@item @emph{C/C++}:
4041@multitable @columnfractions .20 .80
4042@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
4043@end multitable
4044
4045@item @emph{Fortran}:
4046@multitable @columnfractions .20 .80
4047@item @emph{Interface}: @tab @code{function acc_async_test(arg)}
4048@item @tab @code{integer(kind=acc_handle_kind) arg}
4049@item @tab @code{logical acc_async_test}
4050@end multitable
4051
4052@item @emph{Reference}:
4053@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40543.2.9.
4055@end table
4056
4057
4058
4059@node acc_async_test_all
4060@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
4061@table @asis
4062@item @emph{Description}
4063This function tests for completion of all asynchronous operations.
15886c03
TB
4064In C/C++, a non-zero value is returned to indicate all asynchronous
4065operations have completed while Fortran returns @code{true}. If
4066any asynchronous operation has not completed, C/C++ returns zero and
4067Fortran returns @code{false}.
d77de738
ML
4068
4069@item @emph{C/C++}:
4070@multitable @columnfractions .20 .80
4071@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
4072@end multitable
4073
4074@item @emph{Fortran}:
4075@multitable @columnfractions .20 .80
4076@item @emph{Interface}: @tab @code{function acc_async_test()}
4077@item @tab @code{logical acc_get_device_num}
4078@end multitable
4079
4080@item @emph{Reference}:
4081@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40823.2.10.
4083@end table
4084
4085
4086
4087@node acc_wait
4088@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
4089@table @asis
4090@item @emph{Description}
4091This function waits for completion of the asynchronous operation
4092specified in @var{arg}.
4093
4094@item @emph{C/C++}:
4095@multitable @columnfractions .20 .80
4096@item @emph{Prototype}: @tab @code{acc_wait(arg);}
4097@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
4098@end multitable
4099
4100@item @emph{Fortran}:
4101@multitable @columnfractions .20 .80
4102@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
4103@item @tab @code{integer(acc_handle_kind) arg}
4104@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
4105@item @tab @code{integer(acc_handle_kind) arg}
4106@end multitable
4107
4108@item @emph{Reference}:
4109@uref{https://www.openacc.org, OpenACC specification v2.6}, section
41103.2.11.
4111@end table
4112
4113
4114
4115@node acc_wait_all
4116@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
4117@table @asis
4118@item @emph{Description}
4119This function waits for the completion of all asynchronous operations.
4120
4121@item @emph{C/C++}:
4122@multitable @columnfractions .20 .80
4123@item @emph{Prototype}: @tab @code{acc_wait_all(void);}
4124@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
4125@end multitable
4126
4127@item @emph{Fortran}:
4128@multitable @columnfractions .20 .80
4129@item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
4130@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
4131@end multitable
4132
4133@item @emph{Reference}:
4134@uref{https://www.openacc.org, OpenACC specification v2.6}, section
41353.2.13.
4136@end table
4137
4138
4139
4140@node acc_wait_all_async
4141@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
4142@table @asis
4143@item @emph{Description}
4144This function enqueues a wait operation on the queue @var{async} for any
4145and all asynchronous operations that have been previously enqueued on
4146any queue.
4147
4148@item @emph{C/C++}:
4149@multitable @columnfractions .20 .80
4150@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
4151@end multitable
4152
4153@item @emph{Fortran}:
4154@multitable @columnfractions .20 .80
4155@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
4156@item @tab @code{integer(acc_handle_kind) async}
4157@end multitable
4158
4159@item @emph{Reference}:
4160@uref{https://www.openacc.org, OpenACC specification v2.6}, section
41613.2.14.
4162@end table
4163
4164
4165
4166@node acc_wait_async
4167@section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
4168@table @asis
4169@item @emph{Description}
4170This function enqueues a wait operation on queue @var{async} for any and all
4171asynchronous operations enqueued on queue @var{arg}.
4172
4173@item @emph{C/C++}:
4174@multitable @columnfractions .20 .80
4175@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
4176@end multitable
4177
4178@item @emph{Fortran}:
4179@multitable @columnfractions .20 .80
4180@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
4181@item @tab @code{integer(acc_handle_kind) arg, async}
4182@end multitable
4183
4184@item @emph{Reference}:
4185@uref{https://www.openacc.org, OpenACC specification v2.6}, section
41863.2.12.
4187@end table
4188
4189
4190
4191@node acc_init
4192@section @code{acc_init} -- Initialize runtime for a specific device type.
4193@table @asis
4194@item @emph{Description}
4195This function initializes the runtime for the device type specified in
4196@var{devicetype}.
4197
4198@item @emph{C/C++}:
4199@multitable @columnfractions .20 .80
4200@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
4201@end multitable
4202
4203@item @emph{Fortran}:
4204@multitable @columnfractions .20 .80
4205@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
4206@item @tab @code{integer(acc_device_kind) devicetype}
4207@end multitable
4208
4209@item @emph{Reference}:
4210@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42113.2.7.
4212@end table
4213
4214
4215
4216@node acc_shutdown
4217@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
4218@table @asis
4219@item @emph{Description}
4220This function shuts down the runtime for the device type specified in
4221@var{devicetype}.
4222
4223@item @emph{C/C++}:
4224@multitable @columnfractions .20 .80
4225@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
4226@end multitable
4227
4228@item @emph{Fortran}:
4229@multitable @columnfractions .20 .80
4230@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
4231@item @tab @code{integer(acc_device_kind) devicetype}
4232@end multitable
4233
4234@item @emph{Reference}:
4235@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42363.2.8.
4237@end table
4238
4239
4240
4241@node acc_on_device
4242@section @code{acc_on_device} -- Whether executing on a particular device
4243@table @asis
4244@item @emph{Description}:
4245This function returns whether the program is executing on a particular
4246device specified in @var{devicetype}. In C/C++ a non-zero value is
4247returned to indicate the device is executing on the specified device type.
15886c03
TB
4248In Fortran, @code{true} is returned. If the program is not executing
4249on the specified device type C/C++ returns zero, while Fortran
4250returns @code{false}.
d77de738
ML
4251
4252@item @emph{C/C++}:
4253@multitable @columnfractions .20 .80
4254@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
4255@end multitable
4256
4257@item @emph{Fortran}:
4258@multitable @columnfractions .20 .80
4259@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
4260@item @tab @code{integer(acc_device_kind) devicetype}
4261@item @tab @code{logical acc_on_device}
4262@end multitable
4263
4264
4265@item @emph{Reference}:
4266@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42673.2.17.
4268@end table
4269
4270
4271
4272@node acc_malloc
4273@section @code{acc_malloc} -- Allocate device memory.
4274@table @asis
4275@item @emph{Description}
4276This function allocates @var{len} bytes of device memory. It returns
4277the device address of the allocated memory.
4278
4279@item @emph{C/C++}:
4280@multitable @columnfractions .20 .80
4281@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
4282@end multitable
4283
4284@item @emph{Reference}:
4285@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42863.2.18.
4287@end table
4288
4289
4290
4291@node acc_free
4292@section @code{acc_free} -- Free device memory.
4293@table @asis
4294@item @emph{Description}
4295Free previously allocated device memory at the device address @code{a}.
4296
4297@item @emph{C/C++}:
4298@multitable @columnfractions .20 .80
4299@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
4300@end multitable
4301
4302@item @emph{Reference}:
4303@uref{https://www.openacc.org, OpenACC specification v2.6}, section
43043.2.19.
4305@end table
4306
4307
4308
4309@node acc_copyin
4310@section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
4311@table @asis
4312@item @emph{Description}
4313In C/C++, this function allocates @var{len} bytes of device memory
4314and maps it to the specified host address in @var{a}. The device
4315address of the newly allocated device memory is returned.
4316
4317In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4318a contiguous array section. The second form @var{a} specifies a
4319variable or array element and @var{len} specifies the length in bytes.
4320
4321@item @emph{C/C++}:
4322@multitable @columnfractions .20 .80
4323@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
4324@item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);}
4325@end multitable
4326
4327@item @emph{Fortran}:
4328@multitable @columnfractions .20 .80
4329@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
4330@item @tab @code{type, dimension(:[,:]...) :: a}
4331@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
4332@item @tab @code{type, dimension(:[,:]...) :: a}
4333@item @tab @code{integer len}
4334@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)}
4335@item @tab @code{type, dimension(:[,:]...) :: a}
4336@item @tab @code{integer(acc_handle_kind) :: async}
4337@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)}
4338@item @tab @code{type, dimension(:[,:]...) :: a}
4339@item @tab @code{integer len}
4340@item @tab @code{integer(acc_handle_kind) :: async}
4341@end multitable
4342
4343@item @emph{Reference}:
4344@uref{https://www.openacc.org, OpenACC specification v2.6}, section
43453.2.20.
4346@end table
4347
4348
4349
4350@node acc_present_or_copyin
4351@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
4352@table @asis
4353@item @emph{Description}
4354This function tests if the host data specified by @var{a} and of length
15886c03
TB
4355@var{len} is present or not. If it is not present, device memory
4356is allocated and the host memory copied. The device address of
d77de738
ML
4357the newly allocated device memory is returned.
4358
4359In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4360a contiguous array section. The second form @var{a} specifies a variable or
4361array element and @var{len} specifies the length in bytes.
4362
4363Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for
4364backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead.
4365
4366@item @emph{C/C++}:
4367@multitable @columnfractions .20 .80
4368@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
4369@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
4370@end multitable
4371
4372@item @emph{Fortran}:
4373@multitable @columnfractions .20 .80
4374@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
4375@item @tab @code{type, dimension(:[,:]...) :: a}
4376@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
4377@item @tab @code{type, dimension(:[,:]...) :: a}
4378@item @tab @code{integer len}
4379@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
4380@item @tab @code{type, dimension(:[,:]...) :: a}
4381@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
4382@item @tab @code{type, dimension(:[,:]...) :: a}
4383@item @tab @code{integer len}
4384@end multitable
4385
4386@item @emph{Reference}:
4387@uref{https://www.openacc.org, OpenACC specification v2.6}, section
43883.2.20.
4389@end table
4390
4391
4392
4393@node acc_create
4394@section @code{acc_create} -- Allocate device memory and map it to host memory.
4395@table @asis
4396@item @emph{Description}
4397This function allocates device memory and maps it to host memory specified
4398by the host address @var{a} with a length of @var{len} bytes. In C/C++,
4399the function returns the device address of the allocated device memory.
4400
4401In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4402a contiguous array section. The second form @var{a} specifies a variable or
4403array element and @var{len} specifies the length in bytes.
4404
4405@item @emph{C/C++}:
4406@multitable @columnfractions .20 .80
4407@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
4408@item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);}
4409@end multitable
4410
4411@item @emph{Fortran}:
4412@multitable @columnfractions .20 .80
4413@item @emph{Interface}: @tab @code{subroutine acc_create(a)}
4414@item @tab @code{type, dimension(:[,:]...) :: a}
4415@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
4416@item @tab @code{type, dimension(:[,:]...) :: a}
4417@item @tab @code{integer len}
4418@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)}
4419@item @tab @code{type, dimension(:[,:]...) :: a}
4420@item @tab @code{integer(acc_handle_kind) :: async}
4421@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)}
4422@item @tab @code{type, dimension(:[,:]...) :: a}
4423@item @tab @code{integer len}
4424@item @tab @code{integer(acc_handle_kind) :: async}
4425@end multitable
4426
4427@item @emph{Reference}:
4428@uref{https://www.openacc.org, OpenACC specification v2.6}, section
44293.2.21.
4430@end table
4431
4432
4433
4434@node acc_present_or_create
4435@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
4436@table @asis
4437@item @emph{Description}
4438This function tests if the host data specified by @var{a} and of length
15886c03
TB
4439@var{len} is present or not. If it is not present, device memory
4440is allocated and mapped to host memory. In C/C++, the device address
d77de738
ML
4441of the newly allocated device memory is returned.
4442
4443In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4444a contiguous array section. The second form @var{a} specifies a variable or
4445array element and @var{len} specifies the length in bytes.
4446
4447Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for
4448backward compatibility with OpenACC 2.0; use @ref{acc_create} instead.
4449
4450@item @emph{C/C++}:
4451@multitable @columnfractions .20 .80
4452@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
4453@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
4454@end multitable
4455
4456@item @emph{Fortran}:
4457@multitable @columnfractions .20 .80
4458@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
4459@item @tab @code{type, dimension(:[,:]...) :: a}
4460@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
4461@item @tab @code{type, dimension(:[,:]...) :: a}
4462@item @tab @code{integer len}
4463@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
4464@item @tab @code{type, dimension(:[,:]...) :: a}
4465@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
4466@item @tab @code{type, dimension(:[,:]...) :: a}
4467@item @tab @code{integer len}
4468@end multitable
4469
4470@item @emph{Reference}:
4471@uref{https://www.openacc.org, OpenACC specification v2.6}, section
44723.2.21.
4473@end table
4474
4475
4476
4477@node acc_copyout
4478@section @code{acc_copyout} -- Copy device memory to host memory.
4479@table @asis
4480@item @emph{Description}
4481This function copies mapped device memory to host memory which is specified
4482by host address @var{a} for a length @var{len} bytes in C/C++.
4483
4484In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4485a contiguous array section. The second form @var{a} specifies a variable or
4486array element and @var{len} specifies the length in bytes.
4487
4488@item @emph{C/C++}:
4489@multitable @columnfractions .20 .80
4490@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
4491@item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);}
4492@item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);}
4493@item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);}
4494@end multitable
4495
4496@item @emph{Fortran}:
4497@multitable @columnfractions .20 .80
4498@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
4499@item @tab @code{type, dimension(:[,:]...) :: a}
4500@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
4501@item @tab @code{type, dimension(:[,:]...) :: a}
4502@item @tab @code{integer len}
4503@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)}
4504@item @tab @code{type, dimension(:[,:]...) :: a}
4505@item @tab @code{integer(acc_handle_kind) :: async}
4506@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)}
4507@item @tab @code{type, dimension(:[,:]...) :: a}
4508@item @tab @code{integer len}
4509@item @tab @code{integer(acc_handle_kind) :: async}
4510@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)}
4511@item @tab @code{type, dimension(:[,:]...) :: a}
4512@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)}
4513@item @tab @code{type, dimension(:[,:]...) :: a}
4514@item @tab @code{integer len}
4515@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)}
4516@item @tab @code{type, dimension(:[,:]...) :: a}
4517@item @tab @code{integer(acc_handle_kind) :: async}
4518@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)}
4519@item @tab @code{type, dimension(:[,:]...) :: a}
4520@item @tab @code{integer len}
4521@item @tab @code{integer(acc_handle_kind) :: async}
4522@end multitable
4523
4524@item @emph{Reference}:
4525@uref{https://www.openacc.org, OpenACC specification v2.6}, section
45263.2.22.
4527@end table
4528
4529
4530
4531@node acc_delete
4532@section @code{acc_delete} -- Free device memory.
4533@table @asis
4534@item @emph{Description}
4535This function frees previously allocated device memory specified by
4536the device address @var{a} and the length of @var{len} bytes.
4537
4538In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4539a contiguous array section. The second form @var{a} specifies a variable or
4540array element and @var{len} specifies the length in bytes.
4541
4542@item @emph{C/C++}:
4543@multitable @columnfractions .20 .80
4544@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
4545@item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);}
4546@item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);}
4547@item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);}
4548@end multitable
4549
4550@item @emph{Fortran}:
4551@multitable @columnfractions .20 .80
4552@item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
4553@item @tab @code{type, dimension(:[,:]...) :: a}
4554@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
4555@item @tab @code{type, dimension(:[,:]...) :: a}
4556@item @tab @code{integer len}
4557@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)}
4558@item @tab @code{type, dimension(:[,:]...) :: a}
4559@item @tab @code{integer(acc_handle_kind) :: async}
4560@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)}
4561@item @tab @code{type, dimension(:[,:]...) :: a}
4562@item @tab @code{integer len}
4563@item @tab @code{integer(acc_handle_kind) :: async}
4564@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)}
4565@item @tab @code{type, dimension(:[,:]...) :: a}
4566@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)}
4567@item @tab @code{type, dimension(:[,:]...) :: a}
4568@item @tab @code{integer len}
4569@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)}
4570@item @tab @code{type, dimension(:[,:]...) :: a}
4571@item @tab @code{integer(acc_handle_kind) :: async}
4572@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)}
4573@item @tab @code{type, dimension(:[,:]...) :: a}
4574@item @tab @code{integer len}
4575@item @tab @code{integer(acc_handle_kind) :: async}
4576@end multitable
4577
4578@item @emph{Reference}:
4579@uref{https://www.openacc.org, OpenACC specification v2.6}, section
45803.2.23.
4581@end table
4582
4583
4584
4585@node acc_update_device
4586@section @code{acc_update_device} -- Update device memory from mapped host memory.
4587@table @asis
4588@item @emph{Description}
4589This function updates the device copy from the previously mapped host memory.
4590The host memory is specified with the host address @var{a} and a length of
4591@var{len} bytes.
4592
4593In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4594a contiguous array section. The second form @var{a} specifies a variable or
4595array element and @var{len} specifies the length in bytes.
4596
4597@item @emph{C/C++}:
4598@multitable @columnfractions .20 .80
4599@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
4600@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);}
4601@end multitable
4602
4603@item @emph{Fortran}:
4604@multitable @columnfractions .20 .80
4605@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
4606@item @tab @code{type, dimension(:[,:]...) :: a}
4607@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
4608@item @tab @code{type, dimension(:[,:]...) :: a}
4609@item @tab @code{integer len}
4610@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)}
4611@item @tab @code{type, dimension(:[,:]...) :: a}
4612@item @tab @code{integer(acc_handle_kind) :: async}
4613@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)}
4614@item @tab @code{type, dimension(:[,:]...) :: a}
4615@item @tab @code{integer len}
4616@item @tab @code{integer(acc_handle_kind) :: async}
4617@end multitable
4618
4619@item @emph{Reference}:
4620@uref{https://www.openacc.org, OpenACC specification v2.6}, section
46213.2.24.
4622@end table
4623
4624
4625
4626@node acc_update_self
4627@section @code{acc_update_self} -- Update host memory from mapped device memory.
4628@table @asis
4629@item @emph{Description}
4630This function updates the host copy from the previously mapped device memory.
4631The host memory is specified with the host address @var{a} and a length of
4632@var{len} bytes.
4633
4634In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4635a contiguous array section. The second form @var{a} specifies a variable or
4636array element and @var{len} specifies the length in bytes.
4637
4638@item @emph{C/C++}:
4639@multitable @columnfractions .20 .80
4640@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
4641@item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);}
4642@end multitable
4643
4644@item @emph{Fortran}:
4645@multitable @columnfractions .20 .80
4646@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
4647@item @tab @code{type, dimension(:[,:]...) :: a}
4648@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
4649@item @tab @code{type, dimension(:[,:]...) :: a}
4650@item @tab @code{integer len}
4651@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)}
4652@item @tab @code{type, dimension(:[,:]...) :: a}
4653@item @tab @code{integer(acc_handle_kind) :: async}
4654@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)}
4655@item @tab @code{type, dimension(:[,:]...) :: a}
4656@item @tab @code{integer len}
4657@item @tab @code{integer(acc_handle_kind) :: async}
4658@end multitable
4659
4660@item @emph{Reference}:
4661@uref{https://www.openacc.org, OpenACC specification v2.6}, section
46623.2.25.
4663@end table
4664
4665
4666
4667@node acc_map_data
4668@section @code{acc_map_data} -- Map previously allocated device memory to host memory.
4669@table @asis
4670@item @emph{Description}
4671This function maps previously allocated device and host memory. The device
4672memory is specified with the device address @var{d}. The host memory is
4673specified with the host address @var{h} and a length of @var{len}.
4674
4675@item @emph{C/C++}:
4676@multitable @columnfractions .20 .80
4677@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
4678@end multitable
4679
4680@item @emph{Reference}:
4681@uref{https://www.openacc.org, OpenACC specification v2.6}, section
46823.2.26.
4683@end table
4684
4685
4686
4687@node acc_unmap_data
4688@section @code{acc_unmap_data} -- Unmap device memory from host memory.
4689@table @asis
4690@item @emph{Description}
4691This function unmaps previously mapped device and host memory. The latter
4692specified by @var{h}.
4693
4694@item @emph{C/C++}:
4695@multitable @columnfractions .20 .80
4696@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
4697@end multitable
4698
4699@item @emph{Reference}:
4700@uref{https://www.openacc.org, OpenACC specification v2.6}, section
47013.2.27.
4702@end table
4703
4704
4705
4706@node acc_deviceptr
4707@section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
4708@table @asis
4709@item @emph{Description}
4710This function returns the device address that has been mapped to the
4711host address specified by @var{h}.
4712
4713@item @emph{C/C++}:
4714@multitable @columnfractions .20 .80
4715@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
4716@end multitable
4717
4718@item @emph{Reference}:
4719@uref{https://www.openacc.org, OpenACC specification v2.6}, section
47203.2.28.
4721@end table
4722
4723
4724
4725@node acc_hostptr
4726@section @code{acc_hostptr} -- Get host pointer associated with specific device address.
4727@table @asis
4728@item @emph{Description}
4729This function returns the host address that has been mapped to the
4730device address specified by @var{d}.
4731
4732@item @emph{C/C++}:
4733@multitable @columnfractions .20 .80
4734@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
4735@end multitable
4736
4737@item @emph{Reference}:
4738@uref{https://www.openacc.org, OpenACC specification v2.6}, section
47393.2.29.
4740@end table
4741
4742
4743
4744@node acc_is_present
4745@section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
4746@table @asis
4747@item @emph{Description}
4748This function indicates whether the specified host address in @var{a} and a
4749length of @var{len} bytes is present on the device. In C/C++, a non-zero
4750value is returned to indicate the presence of the mapped memory on the
4751device. A zero is returned to indicate the memory is not mapped on the
4752device.
4753
4754In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
4755a contiguous array section. The second form @var{a} specifies a variable or
4756array element and @var{len} specifies the length in bytes. If the host
4757memory is mapped to device memory, then a @code{true} is returned. Otherwise,
4758a @code{false} is return to indicate the mapped memory is not present.
4759
4760@item @emph{C/C++}:
4761@multitable @columnfractions .20 .80
4762@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
4763@end multitable
4764
4765@item @emph{Fortran}:
4766@multitable @columnfractions .20 .80
4767@item @emph{Interface}: @tab @code{function acc_is_present(a)}
4768@item @tab @code{type, dimension(:[,:]...) :: a}
4769@item @tab @code{logical acc_is_present}
4770@item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
4771@item @tab @code{type, dimension(:[,:]...) :: a}
4772@item @tab @code{integer len}
4773@item @tab @code{logical acc_is_present}
4774@end multitable
4775
4776@item @emph{Reference}:
4777@uref{https://www.openacc.org, OpenACC specification v2.6}, section
47783.2.30.
4779@end table
4780
4781
4782
4783@node acc_memcpy_to_device
4784@section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
4785@table @asis
4786@item @emph{Description}
4787This function copies host memory specified by host address of @var{src} to
4788device memory specified by the device address @var{dest} for a length of
4789@var{bytes} bytes.
4790
4791@item @emph{C/C++}:
4792@multitable @columnfractions .20 .80
4793@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
4794@end multitable
4795
4796@item @emph{Reference}:
4797@uref{https://www.openacc.org, OpenACC specification v2.6}, section
47983.2.31.
4799@end table
4800
4801
4802
4803@node acc_memcpy_from_device
4804@section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
4805@table @asis
4806@item @emph{Description}
4807This function copies host memory specified by host address of @var{src} from
4808device memory specified by the device address @var{dest} for a length of
4809@var{bytes} bytes.
4810
4811@item @emph{C/C++}:
4812@multitable @columnfractions .20 .80
4813@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
4814@end multitable
4815
4816@item @emph{Reference}:
4817@uref{https://www.openacc.org, OpenACC specification v2.6}, section
48183.2.32.
4819@end table
4820
4821
4822
4823@node acc_attach
4824@section @code{acc_attach} -- Let device pointer point to device-pointer target.
4825@table @asis
4826@item @emph{Description}
4827This function updates a pointer on the device from pointing to a host-pointer
4828address to pointing to the corresponding device data.
4829
4830@item @emph{C/C++}:
4831@multitable @columnfractions .20 .80
4832@item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);}
4833@item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);}
4834@end multitable
4835
4836@item @emph{Reference}:
4837@uref{https://www.openacc.org, OpenACC specification v2.6}, section
48383.2.34.
4839@end table
4840
4841
4842
4843@node acc_detach
4844@section @code{acc_detach} -- Let device pointer point to host-pointer target.
4845@table @asis
4846@item @emph{Description}
4847This function updates a pointer on the device from pointing to a device-pointer
4848address to pointing to the corresponding host data.
4849
4850@item @emph{C/C++}:
4851@multitable @columnfractions .20 .80
4852@item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);}
4853@item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);}
4854@item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);}
4855@item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);}
4856@end multitable
4857
4858@item @emph{Reference}:
4859@uref{https://www.openacc.org, OpenACC specification v2.6}, section
48603.2.35.
4861@end table
4862
4863
4864
4865@node acc_get_current_cuda_device
4866@section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
4867@table @asis
4868@item @emph{Description}
4869This function returns the CUDA device handle. This handle is the same
4870as used by the CUDA Runtime or Driver API's.
4871
4872@item @emph{C/C++}:
4873@multitable @columnfractions .20 .80
4874@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
4875@end multitable
4876
4877@item @emph{Reference}:
4878@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4879A.2.1.1.
4880@end table
4881
4882
4883
4884@node acc_get_current_cuda_context
4885@section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
4886@table @asis
4887@item @emph{Description}
4888This function returns the CUDA context handle. This handle is the same
4889as used by the CUDA Runtime or Driver API's.
4890
4891@item @emph{C/C++}:
4892@multitable @columnfractions .20 .80
4893@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
4894@end multitable
4895
4896@item @emph{Reference}:
4897@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4898A.2.1.2.
4899@end table
4900
4901
4902
4903@node acc_get_cuda_stream
4904@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
4905@table @asis
4906@item @emph{Description}
4907This function returns the CUDA stream handle for the queue @var{async}.
4908This handle is the same as used by the CUDA Runtime or Driver API's.
4909
4910@item @emph{C/C++}:
4911@multitable @columnfractions .20 .80
4912@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
4913@end multitable
4914
4915@item @emph{Reference}:
4916@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4917A.2.1.3.
4918@end table
4919
4920
4921
4922@node acc_set_cuda_stream
4923@section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
4924@table @asis
4925@item @emph{Description}
4926This function associates the stream handle specified by @var{stream} with
4927the queue @var{async}.
4928
4929This cannot be used to change the stream handle associated with
4930@code{acc_async_sync}.
4931
4932The return value is not specified.
4933
4934@item @emph{C/C++}:
4935@multitable @columnfractions .20 .80
4936@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
4937@end multitable
4938
4939@item @emph{Reference}:
4940@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4941A.2.1.4.
4942@end table
4943
4944
4945
4946@node acc_prof_register
4947@section @code{acc_prof_register} -- Register callbacks.
4948@table @asis
4949@item @emph{Description}:
4950This function registers callbacks.
4951
4952@item @emph{C/C++}:
4953@multitable @columnfractions .20 .80
4954@item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
4955@end multitable
4956
4957@item @emph{See also}:
4958@ref{OpenACC Profiling Interface}
4959
4960@item @emph{Reference}:
4961@uref{https://www.openacc.org, OpenACC specification v2.6}, section
49625.3.
4963@end table
4964
4965
4966
4967@node acc_prof_unregister
4968@section @code{acc_prof_unregister} -- Unregister callbacks.
4969@table @asis
4970@item @emph{Description}:
4971This function unregisters callbacks.
4972
4973@item @emph{C/C++}:
4974@multitable @columnfractions .20 .80
4975@item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
4976@end multitable
4977
4978@item @emph{See also}:
4979@ref{OpenACC Profiling Interface}
4980
4981@item @emph{Reference}:
4982@uref{https://www.openacc.org, OpenACC specification v2.6}, section
49835.3.
4984@end table
4985
4986
4987
4988@node acc_prof_lookup
4989@section @code{acc_prof_lookup} -- Obtain inquiry functions.
4990@table @asis
4991@item @emph{Description}:
4992Function to obtain inquiry functions.
4993
4994@item @emph{C/C++}:
4995@multitable @columnfractions .20 .80
4996@item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
4997@end multitable
4998
4999@item @emph{See also}:
5000@ref{OpenACC Profiling Interface}
5001
5002@item @emph{Reference}:
5003@uref{https://www.openacc.org, OpenACC specification v2.6}, section
50045.3.
5005@end table
5006
5007
5008
5009@node acc_register_library
5010@section @code{acc_register_library} -- Library registration.
5011@table @asis
5012@item @emph{Description}:
5013Function for library registration.
5014
5015@item @emph{C/C++}:
5016@multitable @columnfractions .20 .80
5017@item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
5018@end multitable
5019
5020@item @emph{See also}:
5021@ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
5022
5023@item @emph{Reference}:
5024@uref{https://www.openacc.org, OpenACC specification v2.6}, section
50255.3.
5026@end table
5027
5028
5029
5030@c ---------------------------------------------------------------------
5031@c OpenACC Environment Variables
5032@c ---------------------------------------------------------------------
5033
5034@node OpenACC Environment Variables
5035@chapter OpenACC Environment Variables
5036
5037The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
5038are defined by section 4 of the OpenACC specification in version 2.0.
5039The variable @env{ACC_PROFLIB}
5040is defined by section 4 of the OpenACC specification in version 2.6.
d77de738
ML
5041
5042@menu
5043* ACC_DEVICE_TYPE::
5044* ACC_DEVICE_NUM::
5045* ACC_PROFLIB::
d77de738
ML
5046@end menu
5047
5048
5049
5050@node ACC_DEVICE_TYPE
5051@section @code{ACC_DEVICE_TYPE}
5052@table @asis
67f5d368
TB
5053@item @emph{Description}:
5054Control the default device type to use when executing compute regions.
5055If unset, the code can be run on any device type, favoring a non-host
5056device type.
5057
5058Supported values in GCC (if compiled in) are
5059@itemize
5060@item @code{host}
5061@item @code{nvidia}
5062@item @code{radeon}
5063@end itemize
d77de738
ML
5064@item @emph{Reference}:
5065@uref{https://www.openacc.org, OpenACC specification v2.6}, section
50664.1.
5067@end table
5068
5069
5070
5071@node ACC_DEVICE_NUM
5072@section @code{ACC_DEVICE_NUM}
5073@table @asis
67f5d368
TB
5074@item @emph{Description}:
5075Control which device, identified by device number, is the default device.
5076The value must be a nonnegative integer less than the number of devices.
5077If unset, device number zero is used.
d77de738
ML
5078@item @emph{Reference}:
5079@uref{https://www.openacc.org, OpenACC specification v2.6}, section
50804.2.
5081@end table
5082
5083
5084
5085@node ACC_PROFLIB
5086@section @code{ACC_PROFLIB}
5087@table @asis
67f5d368
TB
5088@item @emph{Description}:
5089Semicolon-separated list of dynamic libraries that are loaded as profiling
5090libraries. Each library must provide at least the @code{acc_register_library}
5091routine. Each library file is found as described by the documentation of
5092@code{dlopen} of your operating system.
d77de738
ML
5093@item @emph{See also}:
5094@ref{acc_register_library}, @ref{OpenACC Profiling Interface}
5095
5096@item @emph{Reference}:
5097@uref{https://www.openacc.org, OpenACC specification v2.6}, section
50984.3.
5099@end table
5100
5101
5102
d77de738
ML
5103@c ---------------------------------------------------------------------
5104@c CUDA Streams Usage
5105@c ---------------------------------------------------------------------
5106
5107@node CUDA Streams Usage
5108@chapter CUDA Streams Usage
5109
5110This applies to the @code{nvptx} plugin only.
5111
5112The library provides elements that perform asynchronous movement of
5113data and asynchronous operation of computing constructs. This
5114asynchronous functionality is implemented by making use of CUDA
5115streams@footnote{See "Stream Management" in "CUDA Driver API",
5116TRM-06703-001, Version 5.5, for additional information}.
5117
5118The primary means by that the asynchronous functionality is accessed
5119is through the use of those OpenACC directives which make use of the
5120@code{async} and @code{wait} clauses. When the @code{async} clause is
5121first used with a directive, it creates a CUDA stream. If an
5122@code{async-argument} is used with the @code{async} clause, then the
5123stream is associated with the specified @code{async-argument}.
5124
5125Following the creation of an association between a CUDA stream and the
5126@code{async-argument} of an @code{async} clause, both the @code{wait}
5127clause and the @code{wait} directive can be used. When either the
5128clause or directive is used after stream creation, it creates a
5129rendezvous point whereby execution waits until all operations
5130associated with the @code{async-argument}, that is, stream, have
5131completed.
5132
5133Normally, the management of the streams that are created as a result of
5134using the @code{async} clause, is done without any intervention by the
5135caller. This implies the association between the @code{async-argument}
15886c03 5136and the CUDA stream is maintained for the lifetime of the program.
d77de738
ML
5137However, this association can be changed through the use of the library
5138function @code{acc_set_cuda_stream}. When the function
5139@code{acc_set_cuda_stream} is called, the CUDA stream that was
15886c03 5140originally associated with the @code{async} clause is destroyed.
d77de738
ML
5141Caution should be taken when changing the association as subsequent
5142references to the @code{async-argument} refer to a different
5143CUDA stream.
5144
5145
5146
5147@c ---------------------------------------------------------------------
5148@c OpenACC Library Interoperability
5149@c ---------------------------------------------------------------------
5150
5151@node OpenACC Library Interoperability
5152@chapter OpenACC Library Interoperability
5153
5154@section Introduction
5155
5156The OpenACC library uses the CUDA Driver API, and may interact with
5157programs that use the Runtime library directly, or another library
5158based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
5159"Interactions with the CUDA Driver API" in
5160"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
5161Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
5162for additional information on library interoperability.}.
5163This chapter describes the use cases and what changes are
5164required in order to use both the OpenACC library and the CUBLAS and Runtime
5165libraries within a program.
5166
5167@section First invocation: NVIDIA CUBLAS library API
5168
5169In this first use case (see below), a function in the CUBLAS library is called
5170prior to any of the functions in the OpenACC library. More specifically, the
5171function @code{cublasCreate()}.
5172
5173When invoked, the function initializes the library and allocates the
5174hardware resources on the host and the device on behalf of the caller. Once
5175the initialization and allocation has completed, a handle is returned to the
5176caller. The OpenACC library also requires initialization and allocation of
5177hardware resources. Since the CUBLAS library has already allocated the
5178hardware resources for the device, all that is left to do is to initialize
5179the OpenACC library and acquire the hardware resources on the host.
5180
5181Prior to calling the OpenACC function that initializes the library and
5182allocate the host hardware resources, you need to acquire the device number
5183that was allocated during the call to @code{cublasCreate()}. The invoking of the
5184runtime library function @code{cudaGetDevice()} accomplishes this. Once
5185acquired, the device number is passed along with the device type as
5186parameters to the OpenACC library function @code{acc_set_device_num()}.
5187
5188Once the call to @code{acc_set_device_num()} has completed, the OpenACC
5189library uses the context that was created during the call to
15886c03 5190@code{cublasCreate()}. In other words, both libraries share the
d77de738
ML
5191same context.
5192
5193@smallexample
5194 /* Create the handle */
5195 s = cublasCreate(&h);
5196 if (s != CUBLAS_STATUS_SUCCESS)
5197 @{
5198 fprintf(stderr, "cublasCreate failed %d\n", s);
5199 exit(EXIT_FAILURE);
5200 @}
5201
5202 /* Get the device number */
5203 e = cudaGetDevice(&dev);
5204 if (e != cudaSuccess)
5205 @{
5206 fprintf(stderr, "cudaGetDevice failed %d\n", e);
5207 exit(EXIT_FAILURE);
5208 @}
5209
5210 /* Initialize OpenACC library and use device 'dev' */
5211 acc_set_device_num(dev, acc_device_nvidia);
5212
5213@end smallexample
5214@center Use Case 1
5215
5216@section First invocation: OpenACC library API
5217
5218In this second use case (see below), a function in the OpenACC library is
eda38850 5219called prior to any of the functions in the CUBLAS library. More specifically,
d77de738
ML
5220the function @code{acc_set_device_num()}.
5221
5222In the use case presented here, the function @code{acc_set_device_num()}
5223is used to both initialize the OpenACC library and allocate the hardware
5224resources on the host and the device. In the call to the function, the
5225call parameters specify which device to use and what device
5226type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
5227is but one method to initialize the OpenACC library and allocate the
5228appropriate hardware resources. Other methods are available through the
15886c03 5229use of environment variables and these is discussed in the next section.
d77de738
ML
5230
5231Once the call to @code{acc_set_device_num()} has completed, other OpenACC
5232functions can be called as seen with multiple calls being made to
5233@code{acc_copyin()}. In addition, calls can be made to functions in the
5234CUBLAS library. In the use case a call to @code{cublasCreate()} is made
5235subsequent to the calls to @code{acc_copyin()}.
5236As seen in the previous use case, a call to @code{cublasCreate()}
5237initializes the CUBLAS library and allocates the hardware resources on the
5238host and the device. However, since the device has already been allocated,
15886c03 5239@code{cublasCreate()} only initializes the CUBLAS library and allocates
d77de738
ML
5240the appropriate hardware resources on the host. The context that was created
5241as part of the OpenACC initialization is shared with the CUBLAS library,
5242similarly to the first use case.
5243
5244@smallexample
5245 dev = 0;
5246
5247 acc_set_device_num(dev, acc_device_nvidia);
5248
5249 /* Copy the first set to the device */
5250 d_X = acc_copyin(&h_X[0], N * sizeof (float));
5251 if (d_X == NULL)
5252 @{
5253 fprintf(stderr, "copyin error h_X\n");
5254 exit(EXIT_FAILURE);
5255 @}
5256
5257 /* Copy the second set to the device */
5258 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
5259 if (d_Y == NULL)
5260 @{
5261 fprintf(stderr, "copyin error h_Y1\n");
5262 exit(EXIT_FAILURE);
5263 @}
5264
5265 /* Create the handle */
5266 s = cublasCreate(&h);
5267 if (s != CUBLAS_STATUS_SUCCESS)
5268 @{
5269 fprintf(stderr, "cublasCreate failed %d\n", s);
5270 exit(EXIT_FAILURE);
5271 @}
5272
5273 /* Perform saxpy using CUBLAS library function */
5274 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
5275 if (s != CUBLAS_STATUS_SUCCESS)
5276 @{
5277 fprintf(stderr, "cublasSaxpy failed %d\n", s);
5278 exit(EXIT_FAILURE);
5279 @}
5280
5281 /* Copy the results from the device */
5282 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
5283
5284@end smallexample
5285@center Use Case 2
5286
5287@section OpenACC library and environment variables
5288
5289There are two environment variables associated with the OpenACC library
5290that may be used to control the device type and device number:
5291@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two
5292environment variables can be used as an alternative to calling
5293@code{acc_set_device_num()}. As seen in the second use case, the device
5294type and device number were specified using @code{acc_set_device_num()}.
5295If however, the aforementioned environment variables were set, then the
5296call to @code{acc_set_device_num()} would not be required.
5297
5298
5299The use of the environment variables is only relevant when an OpenACC function
5300is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
5301is called prior to a call to an OpenACC function, then you must call
5302@code{acc_set_device_num()}@footnote{More complete information
5303about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
5304sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
5305Application Programming Interface”, Version 2.6.}
5306
5307
5308
5309@c ---------------------------------------------------------------------
5310@c OpenACC Profiling Interface
5311@c ---------------------------------------------------------------------
5312
5313@node OpenACC Profiling Interface
5314@chapter OpenACC Profiling Interface
5315
5316@section Implementation Status and Implementation-Defined Behavior
5317
5318We're implementing the OpenACC Profiling Interface as defined by the
5319OpenACC 2.6 specification. We're clarifying some aspects here as
5320@emph{implementation-defined behavior}, while they're still under
5321discussion within the OpenACC Technical Committee.
5322
5323This implementation is tuned to keep the performance impact as low as
5324possible for the (very common) case that the Profiling Interface is
5325not enabled. This is relevant, as the Profiling Interface affects all
5326the @emph{hot} code paths (in the target code, not in the offloaded
5327code). Users of the OpenACC Profiling Interface can be expected to
15886c03
TB
5328understand that performance is impacted to some degree once the
5329Profiling Interface is enabled: for example, because of the
d77de738
ML
5330@emph{runtime} (libgomp) calling into a third-party @emph{library} for
5331every event that has been registered.
5332
5333We're not yet accounting for the fact that @cite{OpenACC events may
5334occur during event processing}.
5335We just handle one case specially, as required by CUDA 9.0
5336@command{nvprof}, that @code{acc_get_device_type}
5337(@ref{acc_get_device_type})) may be called from
5338@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
5339callbacks.
5340
5341We're not yet implementing initialization via a
5342@code{acc_register_library} function that is either statically linked
5343in, or dynamically via @env{LD_PRELOAD}.
5344Initialization via @code{acc_register_library} functions dynamically
5345loaded via the @env{ACC_PROFLIB} environment variable does work, as
5346does directly calling @code{acc_prof_register},
5347@code{acc_prof_unregister}, @code{acc_prof_lookup}.
5348
5349As currently there are no inquiry functions defined, calls to
15886c03 5350@code{acc_prof_lookup} always returns @code{NULL}.
d77de738
ML
5351
5352There aren't separate @emph{start}, @emph{stop} events defined for the
5353event types @code{acc_ev_create}, @code{acc_ev_delete},
5354@code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these
5355should be triggered before or after the actual device-specific call is
5356made. We trigger them after.
5357
5358Remarks about data provided to callbacks:
5359
5360@table @asis
5361
5362@item @code{acc_prof_info.event_type}
5363It's not clear if for @emph{nested} event callbacks (for example,
5364@code{acc_ev_enqueue_launch_start} as part of a parent compute
5365construct), this should be set for the nested event
5366(@code{acc_ev_enqueue_launch_start}), or if the value of the parent
5367construct should remain (@code{acc_ev_compute_construct_start}). In
15886c03 5368this implementation, the value generally corresponds to the
d77de738
ML
5369innermost nested event type.
5370
5371@item @code{acc_prof_info.device_type}
5372@itemize
5373
5374@item
5375For @code{acc_ev_compute_construct_start}, and in presence of an
15886c03 5376@code{if} clause with @emph{false} argument, this still refers to
d77de738
ML
5377the offloading device type.
5378It's not clear if that's the expected behavior.
5379
5380@item
5381Complementary to the item before, for
5382@code{acc_ev_compute_construct_end}, this is set to
5383@code{acc_device_host} in presence of an @code{if} clause with
5384@emph{false} argument.
5385It's not clear if that's the expected behavior.
5386
5387@end itemize
5388
5389@item @code{acc_prof_info.thread_id}
5390Always @code{-1}; not yet implemented.
5391
5392@item @code{acc_prof_info.async}
5393@itemize
5394
5395@item
5396Not yet implemented correctly for
5397@code{acc_ev_compute_construct_start}.
5398
5399@item
5400In a compute construct, for host-fallback
15886c03 5401execution/@code{acc_device_host} it always is
d77de738 5402@code{acc_async_sync}.
15886c03 5403It is unclear if that is the expected behavior.
d77de738
ML
5404
5405@item
5406For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
5407it will always be @code{acc_async_sync}.
15886c03 5408It is unclear if that is the expected behavior.
d77de738
ML
5409
5410@end itemize
5411
5412@item @code{acc_prof_info.async_queue}
5413There is no @cite{limited number of asynchronous queues} in libgomp.
15886c03 5414This always has the same value as @code{acc_prof_info.async}.
d77de738
ML
5415
5416@item @code{acc_prof_info.src_file}
5417Always @code{NULL}; not yet implemented.
5418
5419@item @code{acc_prof_info.func_name}
5420Always @code{NULL}; not yet implemented.
5421
5422@item @code{acc_prof_info.line_no}
5423Always @code{-1}; not yet implemented.
5424
5425@item @code{acc_prof_info.end_line_no}
5426Always @code{-1}; not yet implemented.
5427
5428@item @code{acc_prof_info.func_line_no}
5429Always @code{-1}; not yet implemented.
5430
5431@item @code{acc_prof_info.func_end_line_no}
5432Always @code{-1}; not yet implemented.
5433
5434@item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
5435Relating to @code{acc_prof_info.event_type} discussed above, in this
5436implementation, this will always be the same value as
5437@code{acc_prof_info.event_type}.
5438
5439@item @code{acc_event_info.*.parent_construct}
5440@itemize
5441
5442@item
5443Will be @code{acc_construct_parallel} for all OpenACC compute
5444constructs as well as many OpenACC Runtime API calls; should be the
5445one matching the actual construct, or
5446@code{acc_construct_runtime_api}, respectively.
5447
5448@item
5449Will be @code{acc_construct_enter_data} or
5450@code{acc_construct_exit_data} when processing variable mappings
5451specified in OpenACC @emph{declare} directives; should be
5452@code{acc_construct_declare}.
5453
5454@item
5455For implicit @code{acc_ev_device_init_start},
5456@code{acc_ev_device_init_end}, and explicit as well as implicit
5457@code{acc_ev_alloc}, @code{acc_ev_free},
5458@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
5459@code{acc_ev_enqueue_download_start}, and
5460@code{acc_ev_enqueue_download_end}, will be
5461@code{acc_construct_parallel}; should reflect the real parent
5462construct.
5463
5464@end itemize
5465
5466@item @code{acc_event_info.*.implicit}
5467For @code{acc_ev_alloc}, @code{acc_ev_free},
5468@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
5469@code{acc_ev_enqueue_download_start}, and
5470@code{acc_ev_enqueue_download_end}, this currently will be @code{1}
5471also for explicit usage.
5472
5473@item @code{acc_event_info.data_event.var_name}
5474Always @code{NULL}; not yet implemented.
5475
5476@item @code{acc_event_info.data_event.host_ptr}
5477For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
5478@code{NULL}.
5479
5480@item @code{typedef union acc_api_info}
5481@dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
5482Information}. This should obviously be @code{typedef @emph{struct}
5483acc_api_info}.
5484
5485@item @code{acc_api_info.device_api}
5486Possibly not yet implemented correctly for
5487@code{acc_ev_compute_construct_start},
5488@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
5489will always be @code{acc_device_api_none} for these event types.
5490For @code{acc_ev_enter_data_start}, it will be
5491@code{acc_device_api_none} in some cases.
5492
5493@item @code{acc_api_info.device_type}
5494Always the same as @code{acc_prof_info.device_type}.
5495
5496@item @code{acc_api_info.vendor}
5497Always @code{-1}; not yet implemented.
5498
5499@item @code{acc_api_info.device_handle}
5500Always @code{NULL}; not yet implemented.
5501
5502@item @code{acc_api_info.context_handle}
5503Always @code{NULL}; not yet implemented.
5504
5505@item @code{acc_api_info.async_handle}
5506Always @code{NULL}; not yet implemented.
5507
5508@end table
5509
5510Remarks about certain event types:
5511
5512@table @asis
5513
5514@item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
5515@itemize
5516
5517@item
5518@c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
5519@c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
5520@c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
5521When a compute construct triggers implicit
5522@code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
5523events, they currently aren't @emph{nested within} the corresponding
5524@code{acc_ev_compute_construct_start} and
5525@code{acc_ev_compute_construct_end}, but they're currently observed
5526@emph{before} @code{acc_ev_compute_construct_start}.
5527It's not clear what to do: the standard asks us provide a lot of
5528details to the @code{acc_ev_compute_construct_start} callback, without
5529(implicitly) initializing a device before?
5530
5531@item
5532Callbacks for these event types will not be invoked for calls to the
5533@code{acc_set_device_type} and @code{acc_set_device_num} functions.
5534It's not clear if they should be.
5535
5536@end itemize
5537
5538@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
5539@itemize
5540
5541@item
5542Callbacks for these event types will also be invoked for OpenACC
5543@emph{host_data} constructs.
5544It's not clear if they should be.
5545
5546@item
5547Callbacks for these event types will also be invoked when processing
5548variable mappings specified in OpenACC @emph{declare} directives.
5549It's not clear if they should be.
5550
5551@end itemize
5552
5553@end table
5554
5555Callbacks for the following event types will be invoked, but dispatch
5556and information provided therein has not yet been thoroughly reviewed:
5557
5558@itemize
5559@item @code{acc_ev_alloc}
5560@item @code{acc_ev_free}
5561@item @code{acc_ev_update_start}, @code{acc_ev_update_end}
5562@item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
5563@item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
5564@end itemize
5565
5566During device initialization, and finalization, respectively,
5567callbacks for the following event types will not yet be invoked:
5568
5569@itemize
5570@item @code{acc_ev_alloc}
5571@item @code{acc_ev_free}
5572@end itemize
5573
5574Callbacks for the following event types have not yet been implemented,
5575so currently won't be invoked:
5576
5577@itemize
5578@item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
5579@item @code{acc_ev_runtime_shutdown}
5580@item @code{acc_ev_create}, @code{acc_ev_delete}
5581@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
5582@end itemize
5583
5584For the following runtime library functions, not all expected
5585callbacks will be invoked (mostly concerning implicit device
5586initialization):
5587
5588@itemize
5589@item @code{acc_get_num_devices}
5590@item @code{acc_set_device_type}
5591@item @code{acc_get_device_type}
5592@item @code{acc_set_device_num}
5593@item @code{acc_get_device_num}
5594@item @code{acc_init}
5595@item @code{acc_shutdown}
5596@end itemize
5597
5598Aside from implicit device initialization, for the following runtime
5599library functions, no callbacks will be invoked for shared-memory
5600offloading devices (it's not clear if they should be):
5601
5602@itemize
5603@item @code{acc_malloc}
5604@item @code{acc_free}
5605@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
5606@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
5607@item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
5608@item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
5609@item @code{acc_update_device}, @code{acc_update_device_async}
5610@item @code{acc_update_self}, @code{acc_update_self_async}
5611@item @code{acc_map_data}, @code{acc_unmap_data}
5612@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
5613@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
5614@end itemize
5615
5616@c ---------------------------------------------------------------------
5617@c OpenMP-Implementation Specifics
5618@c ---------------------------------------------------------------------
5619
5620@node OpenMP-Implementation Specifics
5621@chapter OpenMP-Implementation Specifics
5622
5623@menu
2cd0689a 5624* Implementation-defined ICV Initialization::
d77de738 5625* OpenMP Context Selectors::
450b05ce 5626* Memory allocation::
d77de738
ML
5627@end menu
5628
2cd0689a
TB
5629@node Implementation-defined ICV Initialization
5630@section Implementation-defined ICV Initialization
5631@cindex Implementation specific setting
5632
5633@multitable @columnfractions .30 .70
5634@item @var{affinity-format-var} @tab See @ref{OMP_AFFINITY_FORMAT}.
5635@item @var{def-allocator-var} @tab See @ref{OMP_ALLOCATOR}.
5636@item @var{max-active-levels-var} @tab See @ref{OMP_MAX_ACTIVE_LEVELS}.
5637@item @var{dyn-var} @tab See @ref{OMP_DYNAMIC}.
819f3d36 5638@item @var{nthreads-var} @tab See @ref{OMP_NUM_THREADS}.
2cd0689a
TB
5639@item @var{num-devices-var} @tab Number of non-host devices found
5640by GCC's run-time library
5641@item @var{num-procs-var} @tab The number of CPU cores on the
5642initial device, except that affinity settings might lead to a
5643smaller number. On non-host devices, the value of the
5644@var{nthreads-var} ICV.
5645@item @var{place-partition-var} @tab See @ref{OMP_PLACES}.
5646@item @var{run-sched-var} @tab See @ref{OMP_SCHEDULE}.
5647@item @var{stacksize-var} @tab See @ref{OMP_STACKSIZE}.
5648@item @var{thread-limit-var} @tab See @ref{OMP_TEAMS_THREAD_LIMIT}
5649@item @var{wait-policy-var} @tab See @ref{OMP_WAIT_POLICY} and
5650@ref{GOMP_SPINCOUNT}
5651@end multitable
5652
d77de738
ML
5653@node OpenMP Context Selectors
5654@section OpenMP Context Selectors
5655
5656@code{vendor} is always @code{gnu}. References are to the GCC manual.
5657
75e3773b
TB
5658@c NOTE: Only the following selectors have been implemented. To add
5659@c additional traits for target architecture, TARGET_OMP_DEVICE_KIND_ARCH_ISA
5660@c has to be implemented; cf. also PR target/105640.
5661@c For offload devices, add *additionally* gcc/config/*/t-omp-device.
5662
5663For the host compiler, @code{kind} always matches @code{host}; for the
5664offloading architectures AMD GCN and Nvidia PTX, @code{kind} always matches
5665@code{gpu}. For the x86 family of computers, AMD GCN and Nvidia PTX
5666the following traits are supported in addition; while OpenMP is supported
5667on more architectures, GCC currently does not match any @code{arch} or
5668@code{isa} traits for those.
5669
5670@multitable @columnfractions .65 .30
5671@headitem @code{arch} @tab @code{isa}
d77de738
ML
5672@item @code{x86}, @code{x86_64}, @code{i386}, @code{i486},
5673 @code{i586}, @code{i686}, @code{ia32}
d77de738
ML
5674 @tab See @code{-m...} flags in ``x86 Options'' (without @code{-m})
5675@item @code{amdgcn}, @code{gcn}
e0b95c2e
TB
5676 @tab See @code{-march=} in ``AMD GCN Options''@footnote{Additionally,
5677 @code{gfx803} is supported as an alias for @code{fiji}.}
d77de738 5678@item @code{nvptx}
d77de738
ML
5679 @tab See @code{-march=} in ``Nvidia PTX Options''
5680@end multitable
5681
450b05ce
TB
5682@node Memory allocation
5683@section Memory allocation
d77de738 5684
bc238c40
TB
5685The description below applies to:
5686
5687@itemize
5688@item Explicit use of the OpenMP API routines, see
5689 @ref{Memory Management Routines}.
5690@item The @code{allocate} clause, except when the @code{allocator} modifier is a
5691 constant expression with value @code{omp_default_mem_alloc} and no
5692 @code{align} modifier has been specified. (In that case, the normal
5693 @code{malloc} allocation is used.)
5694@item Using the @code{allocate} directive for automatic/stack variables, except
5695 when the @code{allocator} clause is a constant expression with value
5696 @code{omp_default_mem_alloc} and no @code{align} clause has been
5697 specified. (In that case, the normal allocation is used: stack allocation
5698 and, sometimes for Fortran, also @code{malloc} [depending on flags such as
5699 @option{-fstack-arrays}].)
5700@item Using the @code{allocate} directive for variable in static memory is
5701 currently not supported (compile time error).
d4b6d147
TB
5702@item In Fortran, the @code{allocators} directive and the executable
5703 @code{allocate} directive for Fortran pointers and allocatables is
5704 supported, but requires that files containing those directives has to be
5705 compiled with @option{-fopenmp-allocators}. Additionally, all files that
5706 might explicitly or implicitly deallocate memory allocated that way must
5707 also be compiled with that option.
bc238c40
TB
5708@end itemize
5709
a85a106c
TB
5710For the available predefined allocators and, as applicable, their associated
5711predefined memory spaces and for the available traits and their default values,
5712see @ref{OMP_ALLOCATOR}. Predefined allocators without an associated memory
5713space use the @code{omp_default_mem_space} memory space.
5714
8c2fc744
TB
5715For the memory spaces, the following applies:
5716@itemize
5717@item @code{omp_default_mem_space} is supported
5718@item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
30486fab
AS
5719@item @code{omp_low_lat_mem_space} is only available on supported devices,
5720 and maps to @code{omp_default_mem_space} otherwise.
8c2fc744
TB
5721@item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space},
5722 unless the memkind library is available
5723@item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
5724 unless the memkind library is available
5725@end itemize
5726
d77de738
ML
5727On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
5728library} (@code{libmemkind.so.0}) is available at runtime, it is used when
5729creating memory allocators requesting
5730
5731@itemize
5732@item the memory space @code{omp_high_bw_mem_space}
5733@item the memory space @code{omp_large_cap_mem_space}
450b05ce 5734@item the @code{partition} trait @code{interleaved}; note that for
8c2fc744 5735 @code{omp_large_cap_mem_space} the allocation will not be interleaved
d77de738
ML
5736@end itemize
5737
450b05ce
TB
5738On Linux systems, where the @uref{https://github.com/numactl/numactl, numa
5739library} (@code{libnuma.so.1}) is available at runtime, it used when creating
5740memory allocators requesting
5741
5742@itemize
5743@item the @code{partition} trait @code{nearest}, except when both the
5744libmemkind library is available and the memory space is either
5745@code{omp_large_cap_mem_space} or @code{omp_high_bw_mem_space}
5746@end itemize
5747
5748Note that the numa library will round up the allocation size to a multiple of
5749the system page size; therefore, consider using it only with large data or
5750by sharing allocations via the @code{pool_size} trait. Furthermore, the Linux
5751kernel does not guarantee that an allocation will always be on the nearest NUMA
5752node nor that after reallocation the same node will be used. Note additionally
5753that, on Linux, the default setting of the memory placement policy is to use the
5754current node; therefore, unless the memory placement policy has been overridden,
5755the @code{partition} trait @code{environment} (the default) will be effectively
5756a @code{nearest} allocation.
5757
a85a106c 5758Additional notes regarding the traits:
8c2fc744 5759@itemize
348874f0
AS
5760@item The @code{pinned} trait is supported on Linux hosts, but is subject to
5761 the OS @code{ulimit}/@code{rlimit} locked memory settings.
a85a106c
TB
5762@item The default for the @code{pool_size} trait is no pool and for every
5763 (re)allocation the associated library routine is called, which might
5764 internally use a memory pool.
8c2fc744
TB
5765@item For the @code{partition} trait, the partition part size will be the same
5766 as the requested size (i.e. @code{interleaved} or @code{blocked} has no
5767 effect), except for @code{interleaved} when the memkind library is
450b05ce
TB
5768 available. Furthermore, for @code{nearest} and unless the numa library
5769 is available, the memory might not be on the same NUMA node as thread
5770 that allocated the memory; on Linux, this is in particular the case when
5771 the memory placement policy is set to preferred.
8c2fc744
TB
5772@item The @code{access} trait has no effect such that memory is always
5773 accessible by all threads.
5774@item The @code{sync_hint} trait has no effect.
5775@end itemize
d77de738 5776
e9a19ead
AS
5777See also:
5778@ref{Offload-Target Specifics}
5779
d77de738
ML
5780@c ---------------------------------------------------------------------
5781@c Offload-Target Specifics
5782@c ---------------------------------------------------------------------
5783
5784@node Offload-Target Specifics
5785@chapter Offload-Target Specifics
5786
5787The following sections present notes on the offload-target specifics
5788
5789@menu
5790* AMD Radeon::
5791* nvptx::
5792@end menu
5793
5794@node AMD Radeon
5795@section AMD Radeon (GCN)
5796
5797On the hardware side, there is the hierarchy (fine to coarse):
5798@itemize
5799@item work item (thread)
5800@item wavefront
5801@item work group
81476bc4 5802@item compute unit (CU)
d77de738
ML
5803@end itemize
5804
5805All OpenMP and OpenACC levels are used, i.e.
5806@itemize
5807@item OpenMP's simd and OpenACC's vector map to work items (thread)
5808@item OpenMP's threads (``parallel'') and OpenACC's workers map
5809 to wavefronts
5810@item OpenMP's teams and OpenACC's gang use a threadpool with the
5811 size of the number of teams or gangs, respectively.
5812@end itemize
5813
5814The used sizes are
5815@itemize
5816@item Number of teams is the specified @code{num_teams} (OpenMP) or
81476bc4
MV
5817 @code{num_gangs} (OpenACC) or otherwise the number of CU. It is limited
5818 by two times the number of CU.
d77de738
ML
5819@item Number of wavefronts is 4 for gfx900 and 16 otherwise;
5820 @code{num_threads} (OpenMP) and @code{num_workers} (OpenACC)
5821 overrides this if smaller.
5822@item The wavefront has 102 scalars and 64 vectors
5823@item Number of workitems is always 64
5824@item The hardware permits maximally 40 workgroups/CU and
5825 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU.
5826@item 80 scalars registers and 24 vector registers in non-kernel functions
5827 (the chosen procedure-calling API).
5828@item For the kernel itself: as many as register pressure demands (number of
5829 teams and number of threads, scaled down if registers are exhausted)
5830@end itemize
5831
5832The implementation remark:
5833@itemize
5834@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
5835 using the C library @code{printf} functions and the Fortran
5836 @code{print}/@code{write} statements.
243fa488 5837@item Reverse offload regions (i.e. @code{target} regions with
f84fdb13
TB
5838 @code{device(ancestor:1)}) are processed serially per @code{target} region
5839 such that the next reverse offload region is only executed after the previous
5840 one returned.
f1af7d65 5841@item OpenMP code that has a @code{requires} directive with
f84fdb13
TB
5842 @code{unified_shared_memory} will remove any GCN device from the list of
5843 available devices (``host fallback'').
2e3dd14d
TB
5844@item The available stack size can be changed using the @code{GCN_STACK_SIZE}
5845 environment variable; the default is 32 kiB per thread.
e7d6c277
AS
5846@item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the
5847 the @code{access} trait is set to @code{cgroup}. The default pool size
5848 is automatically scaled to share the 64 kiB LDS memory between the number
5849 of teams configured to run on each compute-unit, but may be adjusted at
5850 runtime by setting environment variable
5851 @code{GOMP_GCN_LOWLAT_POOL=@var{bytes}}.
5852@item @code{omp_low_lat_mem_alloc} cannot be used with true low-latency memory
5853 because the definition implies the @code{omp_atv_all} trait; main
5854 graphics memory is used instead.
5855@item @code{omp_cgroup_mem_alloc}, @code{omp_pteam_mem_alloc}, and
5856 @code{omp_thread_mem_alloc}, all use low-latency memory as first
5857 preference, and fall back to main graphics memory when the low-latency
5858 pool is exhausted.
d77de738
ML
5859@end itemize
5860
5861
5862
5863@node nvptx
5864@section nvptx
5865
5866On the hardware side, there is the hierarchy (fine to coarse):
5867@itemize
5868@item thread
5869@item warp
5870@item thread block
5871@item streaming multiprocessor
5872@end itemize
5873
5874All OpenMP and OpenACC levels are used, i.e.
5875@itemize
5876@item OpenMP's simd and OpenACC's vector map to threads
5877@item OpenMP's threads (``parallel'') and OpenACC's workers map to warps
5878@item OpenMP's teams and OpenACC's gang use a threadpool with the
5879 size of the number of teams or gangs, respectively.
5880@end itemize
5881
5882The used sizes are
5883@itemize
5884@item The @code{warp_size} is always 32
5885@item CUDA kernel launched: @code{dim=@{#teams,1,1@}, blocks=@{#threads,warp_size,1@}}.
81476bc4
MV
5886@item The number of teams is limited by the number of blocks the device can
5887 host simultaneously.
d77de738
ML
5888@end itemize
5889
5890Additional information can be obtained by setting the environment variable to
5891@code{GOMP_DEBUG=1} (very verbose; grep for @code{kernel.*launch} for launch
5892parameters).
5893
5894GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA,
5895which caches the JIT in the user's directory (see CUDA documentation; can be
5896tuned by the environment variables @code{CUDA_CACHE_@{DISABLE,MAXSIZE,PATH@}}.
5897
5898Note: While PTX ISA is generic, the @code{-mptx=} and @code{-march=} commandline
eda38850 5899options still affect the used PTX ISA code and, thus, the requirements on
d77de738
ML
5900CUDA version and hardware.
5901
5902The implementation remark:
5903@itemize
5904@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
5905 using the C library @code{printf} functions. Note that the Fortran
5906 @code{print}/@code{write} statements are not supported, yet.
5907@item Compilation OpenMP code that contains @code{requires reverse_offload}
5908 requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
5909 is not supported.
eda38850
TB
5910@item For code containing reverse offload (i.e. @code{target} regions with
5911 @code{device(ancestor:1)}), there is a slight performance penalty
5912 for @emph{all} target regions, consisting mostly of shutdown delay
5913 Per device, reverse offload regions are processed serially such that
5914 the next reverse offload region is only executed after the previous
5915 one returned.
f1af7d65
TB
5916@item OpenMP code that has a @code{requires} directive with
5917 @code{unified_shared_memory} will remove any nvptx device from the
eda38850 5918 list of available devices (``host fallback'').
2cd0689a
TB
5919@item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
5920 in the GCC manual.
25072a47
TB
5921@item The OpenMP routines @code{omp_target_memcpy_rect} and
5922 @code{omp_target_memcpy_rect_async} and the @code{target update}
5923 directive for non-contiguous list items will use the 2D and 3D
5924 memory-copy functions of the CUDA library. Higher dimensions will
5925 call those functions in a loop and are therefore supported.
e9a19ead
AS
5926@item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the
5927 the @code{access} trait is set to @code{cgroup}, the ISA is at least
5928 @code{sm_53}, and the PTX version is at least 4.1. The default pool size
5929 is 8 kiB per team, but may be adjusted at runtime by setting environment
5930 variable @code{GOMP_NVPTX_LOWLAT_POOL=@var{bytes}}. The maximum value is
5931 limited by the available hardware, and care should be taken that the
5932 selected pool size does not unduly limit the number of teams that can
5933 run simultaneously.
5934@item @code{omp_low_lat_mem_alloc} cannot be used with true low-latency memory
5935 because the definition implies the @code{omp_atv_all} trait; main
5936 graphics memory is used instead.
5937@item @code{omp_cgroup_mem_alloc}, @code{omp_pteam_mem_alloc}, and
5938 @code{omp_thread_mem_alloc}, all use low-latency memory as first
5939 preference, and fall back to main graphics memory when the low-latency
5940 pool is exhausted.
d77de738
ML
5941@end itemize
5942
5943
5944@c ---------------------------------------------------------------------
5945@c The libgomp ABI
5946@c ---------------------------------------------------------------------
5947
5948@node The libgomp ABI
5949@chapter The libgomp ABI
5950
5951The following sections present notes on the external ABI as
5952presented by libgomp. Only maintainers should need them.
5953
5954@menu
5955* Implementing MASTER construct::
5956* Implementing CRITICAL construct::
5957* Implementing ATOMIC construct::
5958* Implementing FLUSH construct::
5959* Implementing BARRIER construct::
5960* Implementing THREADPRIVATE construct::
5961* Implementing PRIVATE clause::
5962* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
5963* Implementing REDUCTION clause::
5964* Implementing PARALLEL construct::
5965* Implementing FOR construct::
5966* Implementing ORDERED construct::
5967* Implementing SECTIONS construct::
5968* Implementing SINGLE construct::
5969* Implementing OpenACC's PARALLEL construct::
5970@end menu
5971
5972
5973@node Implementing MASTER construct
5974@section Implementing MASTER construct
5975
5976@smallexample
5977if (omp_get_thread_num () == 0)
5978 block
5979@end smallexample
5980
5981Alternately, we generate two copies of the parallel subfunction
5982and only include this in the version run by the primary thread.
5983Surely this is not worthwhile though...
5984
5985
5986
5987@node Implementing CRITICAL construct
5988@section Implementing CRITICAL construct
5989
5990Without a specified name,
5991
5992@smallexample
5993 void GOMP_critical_start (void);
5994 void GOMP_critical_end (void);
5995@end smallexample
5996
5997so that we don't get COPY relocations from libgomp to the main
5998application.
5999
6000With a specified name, use omp_set_lock and omp_unset_lock with
6001name being transformed into a variable declared like
6002
6003@smallexample
6004 omp_lock_t gomp_critical_user_<name> __attribute__((common))
6005@end smallexample
6006
6007Ideally the ABI would specify that all zero is a valid unlocked
6008state, and so we wouldn't need to initialize this at
6009startup.
6010
6011
6012
6013@node Implementing ATOMIC construct
6014@section Implementing ATOMIC construct
6015
6016The target should implement the @code{__sync} builtins.
6017
6018Failing that we could add
6019
6020@smallexample
6021 void GOMP_atomic_enter (void)
6022 void GOMP_atomic_exit (void)
6023@end smallexample
6024
6025which reuses the regular lock code, but with yet another lock
6026object private to the library.
6027
6028
6029
6030@node Implementing FLUSH construct
6031@section Implementing FLUSH construct
6032
6033Expands to the @code{__sync_synchronize} builtin.
6034
6035
6036
6037@node Implementing BARRIER construct
6038@section Implementing BARRIER construct
6039
6040@smallexample
6041 void GOMP_barrier (void)
6042@end smallexample
6043
6044
6045@node Implementing THREADPRIVATE construct
6046@section Implementing THREADPRIVATE construct
6047
6048In _most_ cases we can map this directly to @code{__thread}. Except
6049that OMP allows constructors for C++ objects. We can either
6050refuse to support this (how often is it used?) or we can
6051implement something akin to .ctors.
6052
6053Even more ideally, this ctor feature is handled by extensions
6054to the main pthreads library. Failing that, we can have a set
6055of entry points to register ctor functions to be called.
6056
6057
6058
6059@node Implementing PRIVATE clause
6060@section Implementing PRIVATE clause
6061
6062In association with a PARALLEL, or within the lexical extent
6063of a PARALLEL block, the variable becomes a local variable in
6064the parallel subfunction.
6065
6066In association with FOR or SECTIONS blocks, create a new
6067automatic variable within the current function. This preserves
6068the semantic of new variable creation.
6069
6070
6071
6072@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
6073@section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
6074
6075This seems simple enough for PARALLEL blocks. Create a private
6076struct for communicating between the parent and subfunction.
6077In the parent, copy in values for scalar and "small" structs;
6078copy in addresses for others TREE_ADDRESSABLE types. In the
6079subfunction, copy the value into the local variable.
6080
6081It is not clear what to do with bare FOR or SECTION blocks.
6082The only thing I can figure is that we do something like:
6083
6084@smallexample
6085#pragma omp for firstprivate(x) lastprivate(y)
6086for (int i = 0; i < n; ++i)
6087 body;
6088@end smallexample
6089
6090which becomes
6091
6092@smallexample
6093@{
6094 int x = x, y;
6095
6096 // for stuff
6097
6098 if (i == n)
6099 y = y;
6100@}
6101@end smallexample
6102
6103where the "x=x" and "y=y" assignments actually have different
6104uids for the two variables, i.e. not something you could write
6105directly in C. Presumably this only makes sense if the "outer"
6106x and y are global variables.
6107
6108COPYPRIVATE would work the same way, except the structure
6109broadcast would have to happen via SINGLE machinery instead.
6110
6111
6112
6113@node Implementing REDUCTION clause
6114@section Implementing REDUCTION clause
6115
6116The private struct mentioned in the previous section should have
6117a pointer to an array of the type of the variable, indexed by the
6118thread's @var{team_id}. The thread stores its final value into the
6119array, and after the barrier, the primary thread iterates over the
6120array to collect the values.
6121
6122
6123@node Implementing PARALLEL construct
6124@section Implementing PARALLEL construct
6125
6126@smallexample
6127 #pragma omp parallel
6128 @{
6129 body;
6130 @}
6131@end smallexample
6132
6133becomes
6134
6135@smallexample
6136 void subfunction (void *data)
6137 @{
6138 use data;
6139 body;
6140 @}
6141
6142 setup data;
6143 GOMP_parallel_start (subfunction, &data, num_threads);
6144 subfunction (&data);
6145 GOMP_parallel_end ();
6146@end smallexample
6147
6148@smallexample
6149 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
6150@end smallexample
6151
6152The @var{FN} argument is the subfunction to be run in parallel.
6153
6154The @var{DATA} argument is a pointer to a structure used to
6155communicate data in and out of the subfunction, as discussed
6156above with respect to FIRSTPRIVATE et al.
6157
6158The @var{NUM_THREADS} argument is 1 if an IF clause is present
6159and false, or the value of the NUM_THREADS clause, if
6160present, or 0.
6161
6162The function needs to create the appropriate number of
6163threads and/or launch them from the dock. It needs to
6164create the team structure and assign team ids.
6165
6166@smallexample
6167 void GOMP_parallel_end (void)
6168@end smallexample
6169
6170Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
6171
6172
6173
6174@node Implementing FOR construct
6175@section Implementing FOR construct
6176
6177@smallexample
6178 #pragma omp parallel for
6179 for (i = lb; i <= ub; i++)
6180 body;
6181@end smallexample
6182
6183becomes
6184
6185@smallexample
6186 void subfunction (void *data)
6187 @{
6188 long _s0, _e0;
6189 while (GOMP_loop_static_next (&_s0, &_e0))
6190 @{
6191 long _e1 = _e0, i;
6192 for (i = _s0; i < _e1; i++)
6193 body;
6194 @}
6195 GOMP_loop_end_nowait ();
6196 @}
6197
6198 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
6199 subfunction (NULL);
6200 GOMP_parallel_end ();
6201@end smallexample
6202
6203@smallexample
6204 #pragma omp for schedule(runtime)
6205 for (i = 0; i < n; i++)
6206 body;
6207@end smallexample
6208
6209becomes
6210
6211@smallexample
6212 @{
6213 long i, _s0, _e0;
6214 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
6215 do @{
6216 long _e1 = _e0;
6217 for (i = _s0, i < _e0; i++)
6218 body;
6219 @} while (GOMP_loop_runtime_next (&_s0, _&e0));
6220 GOMP_loop_end ();
6221 @}
6222@end smallexample
6223
6224Note that while it looks like there is trickiness to propagating
6225a non-constant STEP, there isn't really. We're explicitly allowed
6226to evaluate it as many times as we want, and any variables involved
6227should automatically be handled as PRIVATE or SHARED like any other
6228variables. So the expression should remain evaluable in the
6229subfunction. We can also pull it into a local variable if we like,
6230but since its supposed to remain unchanged, we can also not if we like.
6231
6232If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
6233able to get away with no work-sharing context at all, since we can
6234simply perform the arithmetic directly in each thread to divide up
6235the iterations. Which would mean that we wouldn't need to call any
6236of these routines.
6237
6238There are separate routines for handling loops with an ORDERED
6239clause. Bookkeeping for that is non-trivial...
6240
6241
6242
6243@node Implementing ORDERED construct
6244@section Implementing ORDERED construct
6245
6246@smallexample
6247 void GOMP_ordered_start (void)
6248 void GOMP_ordered_end (void)
6249@end smallexample
6250
6251
6252
6253@node Implementing SECTIONS construct
6254@section Implementing SECTIONS construct
6255
6256A block as
6257
6258@smallexample
6259 #pragma omp sections
6260 @{
6261 #pragma omp section
6262 stmt1;
6263 #pragma omp section
6264 stmt2;
6265 #pragma omp section
6266 stmt3;
6267 @}
6268@end smallexample
6269
6270becomes
6271
6272@smallexample
6273 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
6274 switch (i)
6275 @{
6276 case 1:
6277 stmt1;
6278 break;
6279 case 2:
6280 stmt2;
6281 break;
6282 case 3:
6283 stmt3;
6284 break;
6285 @}
6286 GOMP_barrier ();
6287@end smallexample
6288
6289
6290@node Implementing SINGLE construct
6291@section Implementing SINGLE construct
6292
6293A block like
6294
6295@smallexample
6296 #pragma omp single
6297 @{
6298 body;
6299 @}
6300@end smallexample
6301
6302becomes
6303
6304@smallexample
6305 if (GOMP_single_start ())
6306 body;
6307 GOMP_barrier ();
6308@end smallexample
6309
6310while
6311
6312@smallexample
6313 #pragma omp single copyprivate(x)
6314 body;
6315@end smallexample
6316
6317becomes
6318
6319@smallexample
6320 datap = GOMP_single_copy_start ();
6321 if (datap == NULL)
6322 @{
6323 body;
6324 data.x = x;
6325 GOMP_single_copy_end (&data);
6326 @}
6327 else
6328 x = datap->x;
6329 GOMP_barrier ();
6330@end smallexample
6331
6332
6333
6334@node Implementing OpenACC's PARALLEL construct
6335@section Implementing OpenACC's PARALLEL construct
6336
6337@smallexample
6338 void GOACC_parallel ()
6339@end smallexample
6340
6341
6342
6343@c ---------------------------------------------------------------------
6344@c Reporting Bugs
6345@c ---------------------------------------------------------------------
6346
6347@node Reporting Bugs
6348@chapter Reporting Bugs
6349
6350Bugs in the GNU Offloading and Multi Processing Runtime Library should
6351be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
6352"openacc", or "openmp", or both to the keywords field in the bug
6353report, as appropriate.
6354
6355
6356
6357@c ---------------------------------------------------------------------
6358@c GNU General Public License
6359@c ---------------------------------------------------------------------
6360
6361@include gpl_v3.texi
6362
6363
6364
6365@c ---------------------------------------------------------------------
6366@c GNU Free Documentation License
6367@c ---------------------------------------------------------------------
6368
6369@include fdl.texi
6370
6371
6372
6373@c ---------------------------------------------------------------------
6374@c Funding Free Software
6375@c ---------------------------------------------------------------------
6376
6377@include funding.texi
6378
6379@c ---------------------------------------------------------------------
6380@c Index
6381@c ---------------------------------------------------------------------
6382
6383@node Library Index
6384@unnumbered Library Index
6385
6386@printindex cp
6387
6388@bye