]> git.ipfire.org Git - thirdparty/gcc.git/blame - libgomp/libgomp.texi
Daily bump.
[thirdparty/gcc.git] / libgomp / libgomp.texi
CommitLineData
d77de738
ML
1\input texinfo @c -*-texinfo-*-
2
3@c %**start of header
4@setfilename libgomp.info
5@settitle GNU libgomp
6@c %**end of header
7
8
9@copying
74d5206f 10Copyright @copyright{} 2006-2023 Free Software Foundation, Inc.
d77de738
ML
11
12Permission is granted to copy, distribute and/or modify this document
13under the terms of the GNU Free Documentation License, Version 1.3 or
14any later version published by the Free Software Foundation; with the
15Invariant Sections being ``Funding Free Software'', the Front-Cover
16texts being (a) (see below), and with the Back-Cover Texts being (b)
17(see below). A copy of the license is included in the section entitled
18``GNU Free Documentation License''.
19
20(a) The FSF's Front-Cover Text is:
21
22 A GNU Manual
23
24(b) The FSF's Back-Cover Text is:
25
26 You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development.
29@end copying
30
31@ifinfo
32@dircategory GNU Libraries
33@direntry
34* libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
35@end direntry
36
37This manual documents libgomp, the GNU Offloading and Multi Processing
38Runtime library. This is the GNU implementation of the OpenMP and
39OpenACC APIs for parallel and accelerator programming in C/C++ and
40Fortran.
41
42Published by the Free Software Foundation
4351 Franklin Street, Fifth Floor
44Boston, MA 02110-1301 USA
45
46@insertcopying
47@end ifinfo
48
49
50@setchapternewpage odd
51
52@titlepage
53@title GNU Offloading and Multi Processing Runtime Library
54@subtitle The GNU OpenMP and OpenACC Implementation
55@page
56@vskip 0pt plus 1filll
57@comment For the @value{version-GCC} Version*
58@sp 1
59Published by the Free Software Foundation @*
6051 Franklin Street, Fifth Floor@*
61Boston, MA 02110-1301, USA@*
62@sp 1
63@insertcopying
64@end titlepage
65
66@summarycontents
67@contents
68@page
69
70
71@node Top, Enabling OpenMP
72@top Introduction
73@cindex Introduction
74
75This manual documents the usage of libgomp, the GNU Offloading and
76Multi Processing Runtime Library. This includes the GNU
77implementation of the @uref{https://www.openmp.org, OpenMP} Application
78Programming Interface (API) for multi-platform shared-memory parallel
79programming in C/C++ and Fortran, and the GNU implementation of the
80@uref{https://www.openacc.org, OpenACC} Application Programming
81Interface (API) for offloading of code to accelerator devices in C/C++
82and Fortran.
83
84Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
85on this, support for OpenACC and offloading (both OpenACC and OpenMP
864's target construct) has been added later on, and the library's name
87changed to GNU Offloading and Multi Processing Runtime Library.
88
89
90
91@comment
92@comment When you add a new menu item, please keep the right hand
93@comment aligned to the same column. Do not use tabs. This provides
94@comment better formatting.
95@comment
96@menu
97* Enabling OpenMP:: How to enable OpenMP for your applications.
98* OpenMP Implementation Status:: List of implemented features by OpenMP version
99* OpenMP Runtime Library Routines: Runtime Library Routines.
100 The OpenMP runtime application programming
101 interface.
102* OpenMP Environment Variables: Environment Variables.
103 Influencing OpenMP runtime behavior with
104 environment variables.
105* Enabling OpenACC:: How to enable OpenACC for your
106 applications.
107* OpenACC Runtime Library Routines:: The OpenACC runtime application
108 programming interface.
109* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
110 environment variables.
111* CUDA Streams Usage:: Notes on the implementation of
112 asynchronous operations.
113* OpenACC Library Interoperability:: OpenACC library interoperability with the
114 NVIDIA CUBLAS library.
115* OpenACC Profiling Interface::
116* OpenMP-Implementation Specifics:: Notes specifics of this OpenMP
117 implementation
118* Offload-Target Specifics:: Notes on offload-target specific internals
119* The libgomp ABI:: Notes on the external ABI presented by libgomp.
120* Reporting Bugs:: How to report bugs in the GNU Offloading and
121 Multi Processing Runtime Library.
122* Copying:: GNU general public license says
123 how you can copy and share libgomp.
124* GNU Free Documentation License::
125 How you can copy and share this manual.
126* Funding:: How to help assure continued work for free
127 software.
128* Library Index:: Index of this documentation.
129@end menu
130
131
132@c ---------------------------------------------------------------------
133@c Enabling OpenMP
134@c ---------------------------------------------------------------------
135
136@node Enabling OpenMP
137@chapter Enabling OpenMP
138
139To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
140flag @command{-fopenmp} must be specified. This enables the OpenMP directive
141@code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
142@code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
143@code{!$} conditional compilation sentinels in free form and @code{c$},
144@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
145arranges for automatic linking of the OpenMP runtime library
146(@ref{Runtime Library Routines}).
147
148A complete description of all OpenMP directives may be found in the
149@uref{https://www.openmp.org, OpenMP Application Program Interface} manuals.
150See also @ref{OpenMP Implementation Status}.
151
152
153@c ---------------------------------------------------------------------
154@c OpenMP Implementation Status
155@c ---------------------------------------------------------------------
156
157@node OpenMP Implementation Status
158@chapter OpenMP Implementation Status
159
160@menu
161* OpenMP 4.5:: Feature completion status to 4.5 specification
162* OpenMP 5.0:: Feature completion status to 5.0 specification
163* OpenMP 5.1:: Feature completion status to 5.1 specification
164* OpenMP 5.2:: Feature completion status to 5.2 specification
c16e85d7 165* OpenMP Technical Report 11:: Feature completion status to first 6.0 preview
d77de738
ML
166@end menu
167
168The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version}
169parameter, provided by @code{omp_lib.h} and the @code{omp_lib} module, have
170the value @code{201511} (i.e. OpenMP 4.5).
171
172@node OpenMP 4.5
173@section OpenMP 4.5
174
175The OpenMP 4.5 specification is fully supported.
176
177@node OpenMP 5.0
178@section OpenMP 5.0
179
180@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
181@c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2
182
183@multitable @columnfractions .60 .10 .25
184@headitem Description @tab Status @tab Comments
185@item Array shaping @tab N @tab
186@item Array sections with non-unit strides in C and C++ @tab N @tab
187@item Iterators @tab Y @tab
188@item @code{metadirective} directive @tab N @tab
189@item @code{declare variant} directive
190 @tab P @tab @emph{simd} traits not handled correctly
2cd0689a 191@item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
d77de738 192 env variable @tab Y @tab
2cd0689a 193@item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
d77de738 194@item @code{requires} directive @tab P
8c2fc744 195 @tab complete but no non-host device provides @code{unified_shared_memory}
d77de738 196@item @code{teams} construct outside an enclosing target region @tab Y @tab
85da0b40
TB
197@item Non-rectangular loop nests @tab P
198 @tab Full support for C/C++, partial for Fortran
199 (@uref{https://gcc.gnu.org/PR110735,PR110735})
d77de738
ML
200@item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
201@item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
202 constructs @tab Y @tab
87f9b6c2 203@item Collapse of associated loops that are imperfectly nested loops @tab Y @tab
d77de738
ML
204@item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in
205 @code{simd} construct @tab Y @tab
206@item @code{atomic} constructs in @code{simd} @tab Y @tab
207@item @code{loop} construct @tab Y @tab
208@item @code{order(concurrent)} clause @tab Y @tab
209@item @code{scan} directive and @code{in_scan} modifier for the
210 @code{reduction} clause @tab Y @tab
211@item @code{in_reduction} clause on @code{task} constructs @tab Y @tab
212@item @code{in_reduction} clause on @code{target} constructs @tab P
213 @tab @code{nowait} only stub
214@item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab
215@item @code{task} modifier to @code{reduction} clause @tab Y @tab
216@item @code{affinity} clause to @code{task} construct @tab Y @tab Stub only
217@item @code{detach} clause to @code{task} construct @tab Y @tab
218@item @code{omp_fulfill_event} runtime routine @tab Y @tab
219@item @code{reduction} and @code{in_reduction} clauses on @code{taskloop}
220 and @code{taskloop simd} constructs @tab Y @tab
221@item @code{taskloop} construct cancelable by @code{cancel} construct
222 @tab Y @tab
223@item @code{mutexinoutset} @emph{dependence-type} for @code{depend} clause
224 @tab Y @tab
225@item Predefined memory spaces, memory allocators, allocator traits
13c3e29d 226 @tab Y @tab See also @ref{Memory allocation}
d77de738
ML
227@item Memory management routines @tab Y @tab
228@item @code{allocate} directive @tab N @tab
229@item @code{allocate} clause @tab P @tab Initial support
230@item @code{use_device_addr} clause on @code{target data} @tab Y @tab
f84fdb13 231@item @code{ancestor} modifier on @code{device} clause @tab Y @tab
d77de738
ML
232@item Implicit declare target directive @tab Y @tab
233@item Discontiguous array section with @code{target update} construct
234 @tab N @tab
235@item C/C++'s lvalue expressions in @code{to}, @code{from}
236 and @code{map} clauses @tab N @tab
237@item C/C++'s lvalue expressions in @code{depend} clauses @tab Y @tab
238@item Nested @code{declare target} directive @tab Y @tab
239@item Combined @code{master} constructs @tab Y @tab
240@item @code{depend} clause on @code{taskwait} @tab Y @tab
241@item Weak memory ordering clauses on @code{atomic} and @code{flush} construct
242 @tab Y @tab
243@item @code{hint} clause on the @code{atomic} construct @tab Y @tab Stub only
244@item @code{depobj} construct and depend objects @tab Y @tab
245@item Lock hints were renamed to synchronization hints @tab Y @tab
246@item @code{conditional} modifier to @code{lastprivate} clause @tab Y @tab
247@item Map-order clarifications @tab P @tab
248@item @code{close} @emph{map-type-modifier} @tab Y @tab
249@item Mapping C/C++ pointer variables and to assign the address of
250 device memory mapped by an array section @tab P @tab
251@item Mapping of Fortran pointer and allocatable variables, including pointer
252 and allocatable components of variables
253 @tab P @tab Mapping of vars with allocatable components unsupported
254@item @code{defaultmap} extensions @tab Y @tab
255@item @code{declare mapper} directive @tab N @tab
256@item @code{omp_get_supported_active_levels} routine @tab Y @tab
257@item Runtime routines and environment variables to display runtime thread
258 affinity information @tab Y @tab
259@item @code{omp_pause_resource} and @code{omp_pause_resource_all} runtime
260 routines @tab Y @tab
261@item @code{omp_get_device_num} runtime routine @tab Y @tab
262@item OMPT interface @tab N @tab
263@item OMPD interface @tab N @tab
264@end multitable
265
266@unnumberedsubsec Other new OpenMP 5.0 features
267
268@multitable @columnfractions .60 .10 .25
269@headitem Description @tab Status @tab Comments
270@item Supporting C++'s range-based for loop @tab Y @tab
271@end multitable
272
273
274@node OpenMP 5.1
275@section OpenMP 5.1
276
277@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
278
279@multitable @columnfractions .60 .10 .25
280@headitem Description @tab Status @tab Comments
281@item OpenMP directive as C++ attribute specifiers @tab Y @tab
282@item @code{omp_all_memory} reserved locator @tab Y @tab
283@item @emph{target_device trait} in OpenMP Context @tab N @tab
284@item @code{target_device} selector set in context selectors @tab N @tab
285@item C/C++'s @code{declare variant} directive: elision support of
286 preprocessed code @tab N @tab
287@item @code{declare variant}: new clauses @code{adjust_args} and
288 @code{append_args} @tab N @tab
289@item @code{dispatch} construct @tab N @tab
290@item device-specific ICV settings with environment variables @tab Y @tab
eda38850 291@item @code{assume} and @code{assumes} directives @tab Y @tab
d77de738
ML
292@item @code{nothing} directive @tab Y @tab
293@item @code{error} directive @tab Y @tab
294@item @code{masked} construct @tab Y @tab
295@item @code{scope} directive @tab Y @tab
296@item Loop transformation constructs @tab N @tab
297@item @code{strict} modifier in the @code{grainsize} and @code{num_tasks}
298 clauses of the @code{taskloop} construct @tab Y @tab
b2e1c49b
TB
299@item @code{align} clause in @code{allocate} directive @tab N @tab
300@item @code{align} modifier in @code{allocate} clause @tab Y @tab
d77de738
ML
301@item @code{thread_limit} clause to @code{target} construct @tab Y @tab
302@item @code{has_device_addr} clause to @code{target} construct @tab Y @tab
303@item Iterators in @code{target update} motion clauses and @code{map}
304 clauses @tab N @tab
305@item Indirect calls to the device version of a procedure or function in
306 @code{target} regions @tab N @tab
307@item @code{interop} directive @tab N @tab
308@item @code{omp_interop_t} object support in runtime routines @tab N @tab
309@item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
310@item Extensions to the @code{atomic} directive @tab Y @tab
311@item @code{seq_cst} clause on a @code{flush} construct @tab Y @tab
312@item @code{inoutset} argument to the @code{depend} clause @tab Y @tab
313@item @code{private} and @code{firstprivate} argument to @code{default}
314 clause in C and C++ @tab Y @tab
4ede915d 315@item @code{present} argument to @code{defaultmap} clause @tab Y @tab
d77de738
ML
316@item @code{omp_set_num_teams}, @code{omp_set_teams_thread_limit},
317 @code{omp_get_max_teams}, @code{omp_get_teams_thread_limit} runtime
318 routines @tab Y @tab
319@item @code{omp_target_is_accessible} runtime routine @tab Y @tab
320@item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
321 runtime routines @tab Y @tab
322@item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
323@item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
324 @code{omp_aligned_calloc} runtime routines @tab Y @tab
325@item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
326 @code{omp_atv_default} changed @tab Y @tab
327@item @code{omp_display_env} runtime routine @tab Y @tab
328@item @code{ompt_scope_endpoint_t} enum: @code{ompt_scope_beginend} @tab N @tab
329@item @code{ompt_sync_region_t} enum additions @tab N @tab
330@item @code{ompt_state_t} enum: @code{ompt_state_wait_barrier_implementation}
331 and @code{ompt_state_wait_barrier_teams} @tab N @tab
332@item @code{ompt_callback_target_data_op_emi_t},
333 @code{ompt_callback_target_emi_t}, @code{ompt_callback_target_map_emi_t}
334 and @code{ompt_callback_target_submit_emi_t} @tab N @tab
335@item @code{ompt_callback_error_t} type @tab N @tab
336@item @code{OMP_PLACES} syntax extensions @tab Y @tab
337@item @code{OMP_NUM_TEAMS} and @code{OMP_TEAMS_THREAD_LIMIT} environment
338 variables @tab Y @tab
339@end multitable
340
341@unnumberedsubsec Other new OpenMP 5.1 features
342
343@multitable @columnfractions .60 .10 .25
344@headitem Description @tab Status @tab Comments
345@item Support of strictly structured blocks in Fortran @tab Y @tab
346@item Support of structured block sequences in C/C++ @tab Y @tab
347@item @code{unconstrained} and @code{reproducible} modifiers on @code{order}
348 clause @tab Y @tab
349@item Support @code{begin/end declare target} syntax in C/C++ @tab Y @tab
350@item Pointer predetermined firstprivate getting initialized
351to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
352@item For Fortran, diagnose placing declarative before/between @code{USE},
353 @code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
eda38850 354@item Optional comma between directive and clause in the @code{#pragma} form @tab Y @tab
c16e85d7
TB
355@item @code{indirect} clause in @code{declare target} @tab N @tab
356@item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab
4ede915d
TB
357@item @code{present} modifier to the @code{map}, @code{to} and @code{from}
358 clauses @tab Y @tab
d77de738
ML
359@end multitable
360
361
362@node OpenMP 5.2
363@section OpenMP 5.2
364
365@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
366
367@multitable @columnfractions .60 .10 .25
368@headitem Description @tab Status @tab Comments
2cd0689a 369@item @code{omp_in_explicit_task} routine and @var{explicit-task-var} ICV
d77de738
ML
370 @tab Y @tab
371@item @code{omp}/@code{ompx}/@code{omx} sentinels and @code{omp_}/@code{ompx_}
372 namespaces @tab N/A
373 @tab warning for @code{ompx/omx} sentinels@footnote{The @code{ompx}
374 sentinel as C/C++ pragma and C++ attributes are warned for with
375 @code{-Wunknown-pragmas} (implied by @code{-Wall}) and @code{-Wattributes}
376 (enabled by default), respectively; for Fortran free-source code, there is
377 a warning enabled by default and, for fixed-source code, the @code{omx}
378 sentinel is warned for with with @code{-Wsurprising} (enabled by
379 @code{-Wall}). Unknown clauses are always rejected with an error.}
091b6dbc 380@item Clauses on @code{end} directive can be on directive @tab Y @tab
0698c9fd
TB
381@item @code{destroy} clause with destroy-var argument on @code{depobj}
382 @tab N @tab
d77de738
ML
383@item Deprecation of no-argument @code{destroy} clause on @code{depobj}
384 @tab N @tab
385@item @code{linear} clause syntax changes and @code{step} modifier @tab Y @tab
386@item Deprecation of minus operator for reductions @tab N @tab
387@item Deprecation of separating @code{map} modifiers without comma @tab N @tab
388@item @code{declare mapper} with iterator and @code{present} modifiers
389 @tab N @tab
390@item If a matching mapped list item is not found in the data environment, the
b25ea7ab 391 pointer retains its original value @tab Y @tab
d77de738
ML
392@item New @code{enter} clause as alias for @code{to} on declare target directive
393 @tab Y @tab
394@item Deprecation of @code{to} clause on declare target directive @tab N @tab
395@item Extended list of directives permitted in Fortran pure procedures
2df7e451 396 @tab Y @tab
d77de738
ML
397@item New @code{allocators} directive for Fortran @tab N @tab
398@item Deprecation of @code{allocate} directive for Fortran
399 allocatables/pointers @tab N @tab
400@item Optional paired @code{end} directive with @code{dispatch} @tab N @tab
401@item New @code{memspace} and @code{traits} modifiers for @code{uses_allocators}
402 @tab N @tab
403@item Deprecation of traits array following the allocator_handle expression in
404 @code{uses_allocators} @tab N @tab
405@item New @code{otherwise} clause as alias for @code{default} on metadirectives
406 @tab N @tab
407@item Deprecation of @code{default} clause on metadirectives @tab N @tab
408@item Deprecation of delimited form of @code{declare target} @tab N @tab
409@item Reproducible semantics changed for @code{order(concurrent)} @tab N @tab
410@item @code{allocate} and @code{firstprivate} clauses on @code{scope}
411 @tab Y @tab
412@item @code{ompt_callback_work} @tab N @tab
9f80367e 413@item Default map-type for the @code{map} clause in @code{target enter/exit data}
d77de738
ML
414 @tab Y @tab
415@item New @code{doacross} clause as alias for @code{depend} with
416 @code{source}/@code{sink} modifier @tab Y @tab
417@item Deprecation of @code{depend} with @code{source}/@code{sink} modifier
418 @tab N @tab
419@item @code{omp_cur_iteration} keyword @tab Y @tab
420@end multitable
421
422@unnumberedsubsec Other new OpenMP 5.2 features
423
424@multitable @columnfractions .60 .10 .25
425@headitem Description @tab Status @tab Comments
426@item For Fortran, optional comma between directive and clause @tab N @tab
427@item Conforming device numbers and @code{omp_initial_device} and
428 @code{omp_invalid_device} enum/PARAMETER @tab Y @tab
2cd0689a 429@item Initial value of @var{default-device-var} ICV with
18c8b56c 430 @code{OMP_TARGET_OFFLOAD=mandatory} @tab Y @tab
0698c9fd 431@item @code{all} as @emph{implicit-behavior} for @code{defaultmap} @tab Y @tab
d77de738
ML
432@item @emph{interop_types} in any position of the modifier list for the @code{init} clause
433 of the @code{interop} construct @tab N @tab
434@end multitable
435
436
c16e85d7
TB
437@node OpenMP Technical Report 11
438@section OpenMP Technical Report 11
439
440Technical Report (TR) 11 is the first preview for OpenMP 6.0.
441
442@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
443@multitable @columnfractions .60 .10 .25
444@item Features deprecated in versions 5.2, 5.1 and 5.0 were removed
445 @tab N/A @tab Backward compatibility
446@item The @code{decl} attribute was added to the C++ attribute syntax
447 @tab N @tab
448@item @code{_ALL} suffix to the device-scope environment variables
449 @tab P @tab Host device number wrongly accepted
450@item For Fortran, @emph{locator list} can be also function reference with
451 data pointer result @tab N @tab
452@item Ref-count change for @code{use_device_ptr}/@code{use_device_addr}
453 @tab N @tab
454@item Implicit reduction identifiers of C++ classes
455 @tab N @tab
456@item Change of the @emph{map-type} property from @emph{ultimate} to
457 @emph{default} @tab N @tab
458@item Concept of @emph{assumed-size arrays} in C and C++
459 @tab N @tab
460@item Mapping of @emph{assumed-size arrays} in C, C++ and Fortran
461 @tab N @tab
462@item @code{groupprivate} directive @tab N @tab
463@item @code{local} clause to declare target directive @tab N @tab
464@item @code{part_size} allocator trait @tab N @tab
465@item @code{pin_device}, @code{preferred_device} and @code{target_access}
466 allocator traits
467 @tab N @tab
468@item @code{access} allocator trait changes @tab N @tab
469@item Extension of @code{interop} operation of @code{append_args}, allowing all
470 modifiers of the @code{init} clause
9f80367e 471 @tab N @tab
c16e85d7
TB
472@item @code{interop} clause to @code{dispatch} @tab N @tab
473@item @code{apply} code to loop-transforming constructs @tab N @tab
474@item @code{omp_curr_progress_width} identifier @tab N @tab
475@item @code{safesync} clause to the @code{parallel} construct @tab N @tab
476@item @code{omp_get_max_progress_width} runtime routine @tab N @tab
8da7476c 477@item @code{strict} modifier keyword to @code{num_threads} @tab N @tab
c16e85d7
TB
478@item @code{memscope} clause to @code{atomic} and @code{flush} @tab N @tab
479@item Routines for obtaining memory spaces/allocators for shared/device memory
480 @tab N @tab
481@item @code{omp_get_memspace_num_resources} routine @tab N @tab
482@item @code{omp_get_submemspace} routine @tab N @tab
483@item @code{ompt_get_buffer_limits} OMPT routine @tab N @tab
484@item Extension of @code{OMP_DEFAULT_DEVICE} and new
485 @code{OMP_AVAILABLE_DEVICES} environment vars @tab N @tab
486@item Supporting increments with abstract names in @code{OMP_PLACES} @tab N @tab
487@end multitable
488
489@unnumberedsubsec Other new TR 11 features
490@multitable @columnfractions .60 .10 .25
491@item Relaxed Fortran restrictions to the @code{aligned} clause @tab N @tab
492@item Mapping lambda captures @tab N @tab
493@item For Fortran, atomic compare with storing the comparison result
494 @tab N @tab
c16e85d7
TB
495@end multitable
496
497
498
d77de738
ML
499@c ---------------------------------------------------------------------
500@c OpenMP Runtime Library Routines
501@c ---------------------------------------------------------------------
502
503@node Runtime Library Routines
504@chapter OpenMP Runtime Library Routines
505
506f068e
TB
506The runtime routines described here are defined by Section 18 of the OpenMP
507specification in version 5.2.
d77de738
ML
508
509@menu
506f068e
TB
510* Thread Team Routines::
511* Thread Affinity Routines::
512* Teams Region Routines::
513* Tasking Routines::
514@c * Resource Relinquishing Routines::
515* Device Information Routines::
516@c * Device Memory Routines::
517* Lock Routines::
518* Timing Routines::
519* Event Routine::
520@c * Interoperability Routines::
521@c * Memory Management Routines::
522@c * Tool Control Routine::
523@c * Environment Display Routine::
524@end menu
d77de738 525
506f068e
TB
526
527
528@node Thread Team Routines
529@section Thread Team Routines
530
531Routines controlling threads in the current contention group.
532They have C linkage and do not throw exceptions.
533
534@menu
535* omp_set_num_threads:: Set upper team size limit
d77de738 536* omp_get_num_threads:: Size of the active team
506f068e 537* omp_get_max_threads:: Maximum number of threads of parallel region
d77de738
ML
538* omp_get_thread_num:: Current thread ID
539* omp_in_parallel:: Whether a parallel region is active
d77de738 540* omp_set_dynamic:: Enable/disable dynamic teams
506f068e
TB
541* omp_get_dynamic:: Dynamic teams setting
542* omp_get_cancellation:: Whether cancellation support is enabled
d77de738 543* omp_set_nested:: Enable/disable nested parallel regions
506f068e 544* omp_get_nested:: Nested parallel regions
d77de738 545* omp_set_schedule:: Set the runtime scheduling method
506f068e
TB
546* omp_get_schedule:: Obtain the runtime scheduling method
547* omp_get_teams_thread_limit:: Maximum number of threads imposed by teams
548* omp_get_supported_active_levels:: Maximum number of active regions supported
549* omp_set_max_active_levels:: Limits the number of active parallel regions
550* omp_get_max_active_levels:: Current maximum number of active regions
551* omp_get_level:: Number of parallel regions
552* omp_get_ancestor_thread_num:: Ancestor thread ID
553* omp_get_team_size:: Number of threads in a team
554* omp_get_active_level:: Number of active parallel regions
555@end menu
d77de738 556
d77de738 557
d77de738 558
506f068e
TB
559@node omp_set_num_threads
560@subsection @code{omp_set_num_threads} -- Set upper team size limit
561@table @asis
562@item @emph{Description}:
563Specifies the number of threads used by default in subsequent parallel
564sections, if those do not specify a @code{num_threads} clause. The
565argument of @code{omp_set_num_threads} shall be a positive integer.
d77de738 566
506f068e
TB
567@item @emph{C/C++}:
568@multitable @columnfractions .20 .80
569@item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
570@end multitable
d77de738 571
506f068e
TB
572@item @emph{Fortran}:
573@multitable @columnfractions .20 .80
574@item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
575@item @tab @code{integer, intent(in) :: num_threads}
576@end multitable
d77de738 577
506f068e
TB
578@item @emph{See also}:
579@ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
d77de738 580
506f068e
TB
581@item @emph{Reference}:
582@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
583@end table
d77de738
ML
584
585
506f068e
TB
586
587@node omp_get_num_threads
588@subsection @code{omp_get_num_threads} -- Size of the active team
d77de738
ML
589@table @asis
590@item @emph{Description}:
506f068e
TB
591Returns the number of threads in the current team. In a sequential section of
592the program @code{omp_get_num_threads} returns 1.
d77de738 593
506f068e
TB
594The default team size may be initialized at startup by the
595@env{OMP_NUM_THREADS} environment variable. At runtime, the size
596of the current team may be set either by the @code{NUM_THREADS}
597clause or by @code{omp_set_num_threads}. If none of the above were
598used to define a specific value and @env{OMP_DYNAMIC} is disabled,
599one thread per CPU online is used.
600
601@item @emph{C/C++}:
d77de738 602@multitable @columnfractions .20 .80
506f068e 603@item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
d77de738
ML
604@end multitable
605
606@item @emph{Fortran}:
607@multitable @columnfractions .20 .80
506f068e 608@item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
d77de738
ML
609@end multitable
610
611@item @emph{See also}:
506f068e 612@ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
d77de738
ML
613
614@item @emph{Reference}:
506f068e 615@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
d77de738
ML
616@end table
617
618
619
506f068e
TB
620@node omp_get_max_threads
621@subsection @code{omp_get_max_threads} -- Maximum number of threads of parallel region
d77de738
ML
622@table @asis
623@item @emph{Description}:
506f068e
TB
624Return the maximum number of threads used for the current parallel region
625that does not use the clause @code{num_threads}.
d77de738 626
506f068e 627@item @emph{C/C++}:
d77de738 628@multitable @columnfractions .20 .80
506f068e 629@item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
d77de738
ML
630@end multitable
631
632@item @emph{Fortran}:
633@multitable @columnfractions .20 .80
506f068e 634@item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
d77de738
ML
635@end multitable
636
637@item @emph{See also}:
506f068e 638@ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
d77de738
ML
639
640@item @emph{Reference}:
506f068e 641@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
d77de738
ML
642@end table
643
644
645
506f068e
TB
646@node omp_get_thread_num
647@subsection @code{omp_get_thread_num} -- Current thread ID
d77de738
ML
648@table @asis
649@item @emph{Description}:
506f068e
TB
650Returns a unique thread identification number within the current team.
651In a sequential parts of the program, @code{omp_get_thread_num}
652always returns 0. In parallel regions the return value varies
653from 0 to @code{omp_get_num_threads}-1 inclusive. The return
654value of the primary thread of a team is always 0.
d77de738
ML
655
656@item @emph{C/C++}:
657@multitable @columnfractions .20 .80
506f068e 658@item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
d77de738
ML
659@end multitable
660
661@item @emph{Fortran}:
662@multitable @columnfractions .20 .80
506f068e 663@item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
d77de738
ML
664@end multitable
665
666@item @emph{See also}:
506f068e 667@ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
d77de738
ML
668
669@item @emph{Reference}:
506f068e 670@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
d77de738
ML
671@end table
672
673
674
506f068e
TB
675@node omp_in_parallel
676@subsection @code{omp_in_parallel} -- Whether a parallel region is active
d77de738
ML
677@table @asis
678@item @emph{Description}:
506f068e
TB
679This function returns @code{true} if currently running in parallel,
680@code{false} otherwise. Here, @code{true} and @code{false} represent
681their language-specific counterparts.
d77de738
ML
682
683@item @emph{C/C++}:
684@multitable @columnfractions .20 .80
506f068e 685@item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
d77de738
ML
686@end multitable
687
688@item @emph{Fortran}:
689@multitable @columnfractions .20 .80
506f068e 690@item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
d77de738
ML
691@end multitable
692
d77de738 693@item @emph{Reference}:
506f068e 694@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
d77de738
ML
695@end table
696
697
506f068e
TB
698@node omp_set_dynamic
699@subsection @code{omp_set_dynamic} -- Enable/disable dynamic teams
d77de738
ML
700@table @asis
701@item @emph{Description}:
506f068e
TB
702Enable or disable the dynamic adjustment of the number of threads
703within a team. The function takes the language-specific equivalent
704of @code{true} and @code{false}, where @code{true} enables dynamic
705adjustment of team sizes and @code{false} disables it.
d77de738 706
506f068e 707@item @emph{C/C++}:
d77de738 708@multitable @columnfractions .20 .80
506f068e 709@item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
d77de738
ML
710@end multitable
711
712@item @emph{Fortran}:
713@multitable @columnfractions .20 .80
506f068e
TB
714@item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
715@item @tab @code{logical, intent(in) :: dynamic_threads}
d77de738
ML
716@end multitable
717
718@item @emph{See also}:
506f068e 719@ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
d77de738
ML
720
721@item @emph{Reference}:
506f068e 722@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
d77de738
ML
723@end table
724
725
726
727@node omp_get_dynamic
506f068e 728@subsection @code{omp_get_dynamic} -- Dynamic teams setting
d77de738
ML
729@table @asis
730@item @emph{Description}:
731This function returns @code{true} if enabled, @code{false} otherwise.
732Here, @code{true} and @code{false} represent their language-specific
733counterparts.
734
735The dynamic team setting may be initialized at startup by the
736@env{OMP_DYNAMIC} environment variable or at runtime using
737@code{omp_set_dynamic}. If undefined, dynamic adjustment is
738disabled by default.
739
740@item @emph{C/C++}:
741@multitable @columnfractions .20 .80
742@item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
743@end multitable
744
745@item @emph{Fortran}:
746@multitable @columnfractions .20 .80
747@item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
748@end multitable
749
750@item @emph{See also}:
751@ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
752
753@item @emph{Reference}:
754@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
755@end table
756
757
758
506f068e
TB
759@node omp_get_cancellation
760@subsection @code{omp_get_cancellation} -- Whether cancellation support is enabled
d77de738
ML
761@table @asis
762@item @emph{Description}:
506f068e
TB
763This function returns @code{true} if cancellation is activated, @code{false}
764otherwise. Here, @code{true} and @code{false} represent their language-specific
765counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are
766deactivated.
d77de738 767
506f068e 768@item @emph{C/C++}:
d77de738 769@multitable @columnfractions .20 .80
506f068e 770@item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
d77de738
ML
771@end multitable
772
773@item @emph{Fortran}:
774@multitable @columnfractions .20 .80
506f068e 775@item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
d77de738
ML
776@end multitable
777
778@item @emph{See also}:
506f068e 779@ref{OMP_CANCELLATION}
d77de738
ML
780
781@item @emph{Reference}:
506f068e 782@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
d77de738
ML
783@end table
784
785
786
506f068e
TB
787@node omp_set_nested
788@subsection @code{omp_set_nested} -- Enable/disable nested parallel regions
d77de738
ML
789@table @asis
790@item @emph{Description}:
506f068e
TB
791Enable or disable nested parallel regions, i.e., whether team members
792are allowed to create new teams. The function takes the language-specific
793equivalent of @code{true} and @code{false}, where @code{true} enables
794dynamic adjustment of team sizes and @code{false} disables it.
d77de738 795
506f068e
TB
796Enabling nested parallel regions will also set the maximum number of
797active nested regions to the maximum supported. Disabling nested parallel
798regions will set the maximum number of active nested regions to one.
799
800Note that the @code{omp_set_nested} API routine was deprecated
801in the OpenMP specification 5.2 in favor of @code{omp_set_max_active_levels}.
802
803@item @emph{C/C++}:
d77de738 804@multitable @columnfractions .20 .80
506f068e 805@item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
d77de738
ML
806@end multitable
807
808@item @emph{Fortran}:
809@multitable @columnfractions .20 .80
506f068e
TB
810@item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
811@item @tab @code{logical, intent(in) :: nested}
d77de738
ML
812@end multitable
813
814@item @emph{See also}:
506f068e
TB
815@ref{omp_get_nested}, @ref{omp_set_max_active_levels},
816@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
d77de738
ML
817
818@item @emph{Reference}:
506f068e 819@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
d77de738
ML
820@end table
821
822
823
506f068e
TB
824@node omp_get_nested
825@subsection @code{omp_get_nested} -- Nested parallel regions
d77de738
ML
826@table @asis
827@item @emph{Description}:
506f068e
TB
828This function returns @code{true} if nested parallel regions are
829enabled, @code{false} otherwise. Here, @code{true} and @code{false}
830represent their language-specific counterparts.
831
832The state of nested parallel regions at startup depends on several
833environment variables. If @env{OMP_MAX_ACTIVE_LEVELS} is defined
834and is set to greater than one, then nested parallel regions will be
835enabled. If not defined, then the value of the @env{OMP_NESTED}
836environment variable will be followed if defined. If neither are
837defined, then if either @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND}
838are defined with a list of more than one value, then nested parallel
839regions are enabled. If none of these are defined, then nested parallel
840regions are disabled by default.
841
842Nested parallel regions can be enabled or disabled at runtime using
843@code{omp_set_nested}, or by setting the maximum number of nested
844regions with @code{omp_set_max_active_levels} to one to disable, or
845above one to enable.
846
847Note that the @code{omp_get_nested} API routine was deprecated
848in the OpenMP specification 5.2 in favor of @code{omp_get_max_active_levels}.
849
850@item @emph{C/C++}:
851@multitable @columnfractions .20 .80
852@item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
853@end multitable
854
855@item @emph{Fortran}:
856@multitable @columnfractions .20 .80
857@item @emph{Interface}: @tab @code{logical function omp_get_nested()}
858@end multitable
859
860@item @emph{See also}:
861@ref{omp_get_max_active_levels}, @ref{omp_set_nested},
862@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
863
864@item @emph{Reference}:
865@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
866@end table
867
868
869
870@node omp_set_schedule
871@subsection @code{omp_set_schedule} -- Set the runtime scheduling method
872@table @asis
873@item @emph{Description}:
874Sets the runtime scheduling method. The @var{kind} argument can have the
875value @code{omp_sched_static}, @code{omp_sched_dynamic},
876@code{omp_sched_guided} or @code{omp_sched_auto}. Except for
877@code{omp_sched_auto}, the chunk size is set to the value of
878@var{chunk_size} if positive, or to the default value if zero or negative.
879For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
d77de738
ML
880
881@item @emph{C/C++}
882@multitable @columnfractions .20 .80
506f068e 883@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
d77de738
ML
884@end multitable
885
886@item @emph{Fortran}:
887@multitable @columnfractions .20 .80
506f068e
TB
888@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
889@item @tab @code{integer(kind=omp_sched_kind) kind}
890@item @tab @code{integer chunk_size}
d77de738
ML
891@end multitable
892
893@item @emph{See also}:
506f068e
TB
894@ref{omp_get_schedule}
895@ref{OMP_SCHEDULE}
d77de738
ML
896
897@item @emph{Reference}:
506f068e 898@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
d77de738
ML
899@end table
900
901
506f068e
TB
902
903@node omp_get_schedule
904@subsection @code{omp_get_schedule} -- Obtain the runtime scheduling method
d77de738
ML
905@table @asis
906@item @emph{Description}:
506f068e
TB
907Obtain the runtime scheduling method. The @var{kind} argument will be
908set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
909@code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
910@var{chunk_size}, is set to the chunk size.
d77de738
ML
911
912@item @emph{C/C++}
913@multitable @columnfractions .20 .80
506f068e 914@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
d77de738
ML
915@end multitable
916
917@item @emph{Fortran}:
918@multitable @columnfractions .20 .80
506f068e
TB
919@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
920@item @tab @code{integer(kind=omp_sched_kind) kind}
921@item @tab @code{integer chunk_size}
d77de738
ML
922@end multitable
923
506f068e
TB
924@item @emph{See also}:
925@ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
926
d77de738 927@item @emph{Reference}:
506f068e 928@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
d77de738
ML
929@end table
930
931
506f068e
TB
932@node omp_get_teams_thread_limit
933@subsection @code{omp_get_teams_thread_limit} -- Maximum number of threads imposed by teams
d77de738
ML
934@table @asis
935@item @emph{Description}:
506f068e
TB
936Return the maximum number of threads that will be able to participate in
937each team created by a teams construct.
d77de738
ML
938
939@item @emph{C/C++}:
940@multitable @columnfractions .20 .80
506f068e 941@item @emph{Prototype}: @tab @code{int omp_get_teams_thread_limit(void);}
d77de738
ML
942@end multitable
943
944@item @emph{Fortran}:
945@multitable @columnfractions .20 .80
506f068e 946@item @emph{Interface}: @tab @code{integer function omp_get_teams_thread_limit()}
d77de738
ML
947@end multitable
948
949@item @emph{See also}:
506f068e 950@ref{omp_set_teams_thread_limit}, @ref{OMP_TEAMS_THREAD_LIMIT}
d77de738
ML
951
952@item @emph{Reference}:
506f068e 953@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.6.
d77de738
ML
954@end table
955
956
957
506f068e
TB
958@node omp_get_supported_active_levels
959@subsection @code{omp_get_supported_active_levels} -- Maximum number of active regions supported
d77de738
ML
960@table @asis
961@item @emph{Description}:
506f068e
TB
962This function returns the maximum number of nested, active parallel regions
963supported by this implementation.
d77de738 964
506f068e 965@item @emph{C/C++}
d77de738 966@multitable @columnfractions .20 .80
506f068e 967@item @emph{Prototype}: @tab @code{int omp_get_supported_active_levels(void);}
d77de738
ML
968@end multitable
969
970@item @emph{Fortran}:
971@multitable @columnfractions .20 .80
506f068e 972@item @emph{Interface}: @tab @code{integer function omp_get_supported_active_levels()}
d77de738
ML
973@end multitable
974
975@item @emph{See also}:
506f068e 976@ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
d77de738
ML
977
978@item @emph{Reference}:
506f068e 979@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.15.
d77de738
ML
980@end table
981
982
983
506f068e
TB
984@node omp_set_max_active_levels
985@subsection @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
d77de738
ML
986@table @asis
987@item @emph{Description}:
506f068e
TB
988This function limits the maximum allowed number of nested, active
989parallel regions. @var{max_levels} must be less or equal to
990the value returned by @code{omp_get_supported_active_levels}.
d77de738 991
506f068e
TB
992@item @emph{C/C++}
993@multitable @columnfractions .20 .80
994@item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
995@end multitable
d77de738 996
506f068e
TB
997@item @emph{Fortran}:
998@multitable @columnfractions .20 .80
999@item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
1000@item @tab @code{integer max_levels}
1001@end multitable
d77de738 1002
506f068e
TB
1003@item @emph{See also}:
1004@ref{omp_get_max_active_levels}, @ref{omp_get_active_level},
1005@ref{omp_get_supported_active_levels}
2cd0689a 1006
506f068e
TB
1007@item @emph{Reference}:
1008@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
1009@end table
1010
1011
1012
1013@node omp_get_max_active_levels
1014@subsection @code{omp_get_max_active_levels} -- Current maximum number of active regions
1015@table @asis
1016@item @emph{Description}:
1017This function obtains the maximum allowed number of nested, active parallel regions.
1018
1019@item @emph{C/C++}
d77de738 1020@multitable @columnfractions .20 .80
506f068e 1021@item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
d77de738
ML
1022@end multitable
1023
1024@item @emph{Fortran}:
1025@multitable @columnfractions .20 .80
506f068e 1026@item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
d77de738
ML
1027@end multitable
1028
1029@item @emph{See also}:
506f068e 1030@ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
d77de738
ML
1031
1032@item @emph{Reference}:
506f068e 1033@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
d77de738
ML
1034@end table
1035
1036
506f068e
TB
1037@node omp_get_level
1038@subsection @code{omp_get_level} -- Obtain the current nesting level
d77de738
ML
1039@table @asis
1040@item @emph{Description}:
506f068e
TB
1041This function returns the nesting level for the parallel blocks,
1042which enclose the calling call.
d77de738 1043
506f068e 1044@item @emph{C/C++}
d77de738 1045@multitable @columnfractions .20 .80
506f068e 1046@item @emph{Prototype}: @tab @code{int omp_get_level(void);}
d77de738
ML
1047@end multitable
1048
1049@item @emph{Fortran}:
1050@multitable @columnfractions .20 .80
506f068e 1051@item @emph{Interface}: @tab @code{integer function omp_level()}
d77de738
ML
1052@end multitable
1053
506f068e
TB
1054@item @emph{See also}:
1055@ref{omp_get_active_level}
1056
d77de738 1057@item @emph{Reference}:
506f068e 1058@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
d77de738
ML
1059@end table
1060
1061
1062
506f068e
TB
1063@node omp_get_ancestor_thread_num
1064@subsection @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
d77de738
ML
1065@table @asis
1066@item @emph{Description}:
506f068e
TB
1067This function returns the thread identification number for the given
1068nesting level of the current thread. For values of @var{level} outside
1069zero to @code{omp_get_level} -1 is returned; if @var{level} is
1070@code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
d77de738 1071
506f068e 1072@item @emph{C/C++}
d77de738 1073@multitable @columnfractions .20 .80
506f068e 1074@item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
d77de738
ML
1075@end multitable
1076
1077@item @emph{Fortran}:
1078@multitable @columnfractions .20 .80
506f068e
TB
1079@item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
1080@item @tab @code{integer level}
d77de738
ML
1081@end multitable
1082
506f068e
TB
1083@item @emph{See also}:
1084@ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
1085
d77de738 1086@item @emph{Reference}:
506f068e 1087@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
d77de738
ML
1088@end table
1089
1090
1091
506f068e
TB
1092@node omp_get_team_size
1093@subsection @code{omp_get_team_size} -- Number of threads in a team
d77de738
ML
1094@table @asis
1095@item @emph{Description}:
506f068e
TB
1096This function returns the number of threads in a thread team to which
1097either the current thread or its ancestor belongs. For values of @var{level}
1098outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
10991 is returned, and for @code{omp_get_level}, the result is identical
1100to @code{omp_get_num_threads}.
d77de738
ML
1101
1102@item @emph{C/C++}:
1103@multitable @columnfractions .20 .80
506f068e 1104@item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
d77de738
ML
1105@end multitable
1106
1107@item @emph{Fortran}:
1108@multitable @columnfractions .20 .80
506f068e
TB
1109@item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
1110@item @tab @code{integer level}
d77de738
ML
1111@end multitable
1112
506f068e
TB
1113@item @emph{See also}:
1114@ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
1115
d77de738 1116@item @emph{Reference}:
506f068e 1117@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
d77de738
ML
1118@end table
1119
1120
1121
506f068e
TB
1122@node omp_get_active_level
1123@subsection @code{omp_get_active_level} -- Number of parallel regions
d77de738
ML
1124@table @asis
1125@item @emph{Description}:
506f068e
TB
1126This function returns the nesting level for the active parallel blocks,
1127which enclose the calling call.
d77de738 1128
506f068e 1129@item @emph{C/C++}
d77de738 1130@multitable @columnfractions .20 .80
506f068e 1131@item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
d77de738
ML
1132@end multitable
1133
1134@item @emph{Fortran}:
1135@multitable @columnfractions .20 .80
506f068e 1136@item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
d77de738
ML
1137@end multitable
1138
1139@item @emph{See also}:
506f068e 1140@ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
d77de738
ML
1141
1142@item @emph{Reference}:
506f068e 1143@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
d77de738
ML
1144@end table
1145
1146
1147
506f068e
TB
1148@node Thread Affinity Routines
1149@section Thread Affinity Routines
1150
1151Routines controlling and accessing thread-affinity policies.
1152They have C linkage and do not throw exceptions.
1153
1154@menu
1155* omp_get_proc_bind:: Whether threads may be moved between CPUs
1156@c * omp_get_num_places:: <fixme>
1157@c * omp_get_place_num_procs:: <fixme>
1158@c * omp_get_place_proc_ids:: <fixme>
1159@c * omp_get_place_num:: <fixme>
1160@c * omp_get_partition_num_places:: <fixme>
1161@c * omp_get_partition_place_nums:: <fixme>
1162@c * omp_set_affinity_format:: <fixme>
1163@c * omp_get_affinity_format:: <fixme>
1164@c * omp_display_affinity:: <fixme>
1165@c * omp_capture_affinity:: <fixme>
1166@end menu
1167
1168
1169
d77de738 1170@node omp_get_proc_bind
506f068e 1171@subsection @code{omp_get_proc_bind} -- Whether threads may be moved between CPUs
d77de738
ML
1172@table @asis
1173@item @emph{Description}:
1174This functions returns the currently active thread affinity policy, which is
1175set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false},
1176@code{omp_proc_bind_true}, @code{omp_proc_bind_primary},
1177@code{omp_proc_bind_master}, @code{omp_proc_bind_close} and @code{omp_proc_bind_spread},
1178where @code{omp_proc_bind_master} is an alias for @code{omp_proc_bind_primary}.
1179
1180@item @emph{C/C++}:
1181@multitable @columnfractions .20 .80
1182@item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
1183@end multitable
1184
1185@item @emph{Fortran}:
1186@multitable @columnfractions .20 .80
1187@item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
1188@end multitable
1189
1190@item @emph{See also}:
1191@ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
1192
1193@item @emph{Reference}:
1194@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
1195@end table
1196
1197
1198
506f068e
TB
1199@node Teams Region Routines
1200@section Teams Region Routines
d77de738 1201
506f068e
TB
1202Routines controlling the league of teams that are executed in a @code{teams}
1203region. They have C linkage and do not throw exceptions.
d77de738 1204
506f068e
TB
1205@menu
1206* omp_get_num_teams:: Number of teams
1207* omp_get_team_num:: Get team number
1208* omp_set_num_teams:: Set upper teams limit for teams region
1209* omp_get_max_teams:: Maximum number of teams for teams region
1210* omp_set_teams_thread_limit:: Set upper thread limit for teams construct
1211* omp_get_thread_limit:: Maximum number of threads
1212@end menu
d77de738 1213
d77de738
ML
1214
1215
506f068e
TB
1216@node omp_get_num_teams
1217@subsection @code{omp_get_num_teams} -- Number of teams
d77de738
ML
1218@table @asis
1219@item @emph{Description}:
506f068e 1220Returns the number of teams in the current team region.
d77de738 1221
506f068e 1222@item @emph{C/C++}:
d77de738 1223@multitable @columnfractions .20 .80
506f068e 1224@item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
d77de738
ML
1225@end multitable
1226
1227@item @emph{Fortran}:
1228@multitable @columnfractions .20 .80
506f068e 1229@item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
d77de738
ML
1230@end multitable
1231
d77de738 1232@item @emph{Reference}:
506f068e 1233@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
d77de738
ML
1234@end table
1235
1236
1237
1238@node omp_get_team_num
506f068e 1239@subsection @code{omp_get_team_num} -- Get team number
d77de738
ML
1240@table @asis
1241@item @emph{Description}:
1242Returns the team number of the calling thread.
1243
1244@item @emph{C/C++}:
1245@multitable @columnfractions .20 .80
1246@item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
1247@end multitable
1248
1249@item @emph{Fortran}:
1250@multitable @columnfractions .20 .80
1251@item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
1252@end multitable
1253
1254@item @emph{Reference}:
1255@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
1256@end table
1257
1258
1259
506f068e
TB
1260@node omp_set_num_teams
1261@subsection @code{omp_set_num_teams} -- Set upper teams limit for teams construct
d77de738
ML
1262@table @asis
1263@item @emph{Description}:
506f068e
TB
1264Specifies the upper bound for number of teams created by the teams construct
1265which does not specify a @code{num_teams} clause. The
1266argument of @code{omp_set_num_teams} shall be a positive integer.
d77de738
ML
1267
1268@item @emph{C/C++}:
1269@multitable @columnfractions .20 .80
506f068e 1270@item @emph{Prototype}: @tab @code{void omp_set_num_teams(int num_teams);}
d77de738
ML
1271@end multitable
1272
1273@item @emph{Fortran}:
1274@multitable @columnfractions .20 .80
506f068e
TB
1275@item @emph{Interface}: @tab @code{subroutine omp_set_num_teams(num_teams)}
1276@item @tab @code{integer, intent(in) :: num_teams}
d77de738
ML
1277@end multitable
1278
1279@item @emph{See also}:
506f068e 1280@ref{OMP_NUM_TEAMS}, @ref{omp_get_num_teams}, @ref{omp_get_max_teams}
d77de738
ML
1281
1282@item @emph{Reference}:
506f068e 1283@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.3.
d77de738
ML
1284@end table
1285
1286
1287
506f068e
TB
1288@node omp_get_max_teams
1289@subsection @code{omp_get_max_teams} -- Maximum number of teams of teams region
d77de738
ML
1290@table @asis
1291@item @emph{Description}:
506f068e
TB
1292Return the maximum number of teams used for the teams region
1293that does not use the clause @code{num_teams}.
d77de738
ML
1294
1295@item @emph{C/C++}:
1296@multitable @columnfractions .20 .80
506f068e 1297@item @emph{Prototype}: @tab @code{int omp_get_max_teams(void);}
d77de738
ML
1298@end multitable
1299
1300@item @emph{Fortran}:
1301@multitable @columnfractions .20 .80
506f068e 1302@item @emph{Interface}: @tab @code{integer function omp_get_max_teams()}
d77de738
ML
1303@end multitable
1304
1305@item @emph{See also}:
506f068e 1306@ref{omp_set_num_teams}, @ref{omp_get_num_teams}
d77de738
ML
1307
1308@item @emph{Reference}:
506f068e 1309@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.4.
d77de738
ML
1310@end table
1311
1312
1313
506f068e
TB
1314@node omp_set_teams_thread_limit
1315@subsection @code{omp_set_teams_thread_limit} -- Set upper thread limit for teams construct
d77de738
ML
1316@table @asis
1317@item @emph{Description}:
506f068e
TB
1318Specifies the upper bound for number of threads that will be available
1319for each team created by the teams construct which does not specify a
1320@code{thread_limit} clause. The argument of
1321@code{omp_set_teams_thread_limit} shall be a positive integer.
d77de738
ML
1322
1323@item @emph{C/C++}:
1324@multitable @columnfractions .20 .80
506f068e 1325@item @emph{Prototype}: @tab @code{void omp_set_teams_thread_limit(int thread_limit);}
d77de738
ML
1326@end multitable
1327
1328@item @emph{Fortran}:
1329@multitable @columnfractions .20 .80
506f068e
TB
1330@item @emph{Interface}: @tab @code{subroutine omp_set_teams_thread_limit(thread_limit)}
1331@item @tab @code{integer, intent(in) :: thread_limit}
d77de738
ML
1332@end multitable
1333
1334@item @emph{See also}:
506f068e 1335@ref{OMP_TEAMS_THREAD_LIMIT}, @ref{omp_get_teams_thread_limit}, @ref{omp_get_thread_limit}
d77de738
ML
1336
1337@item @emph{Reference}:
506f068e 1338@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.5.
d77de738
ML
1339@end table
1340
1341
1342
506f068e
TB
1343@node omp_get_thread_limit
1344@subsection @code{omp_get_thread_limit} -- Maximum number of threads
d77de738
ML
1345@table @asis
1346@item @emph{Description}:
506f068e 1347Return the maximum number of threads of the program.
d77de738
ML
1348
1349@item @emph{C/C++}:
1350@multitable @columnfractions .20 .80
506f068e 1351@item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
d77de738
ML
1352@end multitable
1353
1354@item @emph{Fortran}:
1355@multitable @columnfractions .20 .80
506f068e 1356@item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
d77de738
ML
1357@end multitable
1358
1359@item @emph{See also}:
506f068e 1360@ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
d77de738
ML
1361
1362@item @emph{Reference}:
506f068e 1363@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
d77de738
ML
1364@end table
1365
1366
1367
506f068e
TB
1368@node Tasking Routines
1369@section Tasking Routines
1370
1371Routines relating to explicit tasks.
1372They have C linkage and do not throw exceptions.
1373
1374@menu
1375* omp_get_max_task_priority:: Maximum task priority value that can be set
819f3d36 1376* omp_in_explicit_task:: Whether a given task is an explicit task
506f068e
TB
1377* omp_in_final:: Whether in final or included task region
1378@end menu
1379
1380
1381
1382@node omp_get_max_task_priority
1383@subsection @code{omp_get_max_task_priority} -- Maximum priority value
1384that can be set for tasks.
d77de738
ML
1385@table @asis
1386@item @emph{Description}:
506f068e 1387This function obtains the maximum allowed priority number for tasks.
d77de738 1388
506f068e 1389@item @emph{C/C++}
d77de738 1390@multitable @columnfractions .20 .80
506f068e 1391@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
d77de738
ML
1392@end multitable
1393
1394@item @emph{Fortran}:
1395@multitable @columnfractions .20 .80
506f068e 1396@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
d77de738
ML
1397@end multitable
1398
1399@item @emph{Reference}:
506f068e 1400@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
d77de738
ML
1401@end table
1402
1403
506f068e 1404
819f3d36
TB
1405@node omp_in_explicit_task
1406@subsection @code{omp_in_explicit_task} -- Whether a given task is an explicit task
1407@table @asis
1408@item @emph{Description}:
1409The function returns the @var{explicit-task-var} ICV; it returns true when the
1410encountering task was generated by a task-generating construct such as
1411@code{target}, @code{task} or @code{taskloop}. Otherwise, the encountering task
1412is in an implicit task region such as generated by the implicit or explicit
1413@code{parallel} region and @code{omp_in_explicit_task} returns false.
1414
1415@item @emph{C/C++}
1416@multitable @columnfractions .20 .80
1417@item @emph{Prototype}: @tab @code{int omp_in_explicit_task(void);}
1418@end multitable
1419
1420@item @emph{Fortran}:
1421@multitable @columnfractions .20 .80
1422@item @emph{Interface}: @tab @code{logical function omp_in_explicit_task()}
1423@end multitable
1424
1425@item @emph{Reference}:
1426@uref{https://www.openmp.org, OpenMP specification v5.2}, Section 18.5.2.
1427@end table
1428
1429
1430
d77de738 1431@node omp_in_final
506f068e 1432@subsection @code{omp_in_final} -- Whether in final or included task region
d77de738
ML
1433@table @asis
1434@item @emph{Description}:
1435This function returns @code{true} if currently running in a final
1436or included task region, @code{false} otherwise. Here, @code{true}
1437and @code{false} represent their language-specific counterparts.
1438
1439@item @emph{C/C++}:
1440@multitable @columnfractions .20 .80
1441@item @emph{Prototype}: @tab @code{int omp_in_final(void);}
1442@end multitable
1443
1444@item @emph{Fortran}:
1445@multitable @columnfractions .20 .80
1446@item @emph{Interface}: @tab @code{logical function omp_in_final()}
1447@end multitable
1448
1449@item @emph{Reference}:
1450@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
1451@end table
1452
1453
1454
506f068e
TB
1455@c @node Resource Relinquishing Routines
1456@c @section Resource Relinquishing Routines
1457@c
1458@c Routines releasing resources used by the OpenMP runtime.
1459@c They have C linkage and do not throw exceptions.
1460@c
1461@c @menu
1462@c * omp_pause_resource:: <fixme>
1463@c * omp_pause_resource_all:: <fixme>
1464@c @end menu
1465
1466@node Device Information Routines
1467@section Device Information Routines
1468
1469Routines related to devices available to an OpenMP program.
1470They have C linkage and do not throw exceptions.
1471
1472@menu
1473* omp_get_num_procs:: Number of processors online
1474@c * omp_get_max_progress_width:: <fixme>/TR11
1475* omp_set_default_device:: Set the default device for target regions
1476* omp_get_default_device:: Get the default device for target regions
1477* omp_get_num_devices:: Number of target devices
1478* omp_get_device_num:: Get device that current thread is running on
1479* omp_is_initial_device:: Whether executing on the host device
1480* omp_get_initial_device:: Device number of host device
1481@end menu
1482
1483
1484
1485@node omp_get_num_procs
1486@subsection @code{omp_get_num_procs} -- Number of processors online
d77de738
ML
1487@table @asis
1488@item @emph{Description}:
506f068e 1489Returns the number of processors online on that device.
d77de738
ML
1490
1491@item @emph{C/C++}:
1492@multitable @columnfractions .20 .80
506f068e 1493@item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
d77de738
ML
1494@end multitable
1495
1496@item @emph{Fortran}:
1497@multitable @columnfractions .20 .80
506f068e 1498@item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
d77de738
ML
1499@end multitable
1500
1501@item @emph{Reference}:
506f068e 1502@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
d77de738
ML
1503@end table
1504
1505
1506
1507@node omp_set_default_device
506f068e 1508@subsection @code{omp_set_default_device} -- Set the default device for target regions
d77de738
ML
1509@table @asis
1510@item @emph{Description}:
1511Set the default device for target regions without device clause. The argument
1512shall be a nonnegative device number.
1513
1514@item @emph{C/C++}:
1515@multitable @columnfractions .20 .80
1516@item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
1517@end multitable
1518
1519@item @emph{Fortran}:
1520@multitable @columnfractions .20 .80
1521@item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
1522@item @tab @code{integer device_num}
1523@end multitable
1524
1525@item @emph{See also}:
1526@ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
1527
1528@item @emph{Reference}:
1529@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
1530@end table
1531
1532
1533
506f068e
TB
1534@node omp_get_default_device
1535@subsection @code{omp_get_default_device} -- Get the default device for target regions
d77de738
ML
1536@table @asis
1537@item @emph{Description}:
506f068e 1538Get the default device for target regions without device clause.
2cd0689a 1539
d77de738
ML
1540@item @emph{C/C++}:
1541@multitable @columnfractions .20 .80
506f068e 1542@item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
d77de738
ML
1543@end multitable
1544
1545@item @emph{Fortran}:
1546@multitable @columnfractions .20 .80
506f068e 1547@item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
d77de738
ML
1548@end multitable
1549
1550@item @emph{See also}:
506f068e 1551@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
d77de738
ML
1552
1553@item @emph{Reference}:
506f068e 1554@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
d77de738
ML
1555@end table
1556
1557
1558
506f068e
TB
1559@node omp_get_num_devices
1560@subsection @code{omp_get_num_devices} -- Number of target devices
d77de738
ML
1561@table @asis
1562@item @emph{Description}:
506f068e 1563Returns the number of target devices.
d77de738
ML
1564
1565@item @emph{C/C++}:
1566@multitable @columnfractions .20 .80
506f068e 1567@item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
d77de738
ML
1568@end multitable
1569
1570@item @emph{Fortran}:
1571@multitable @columnfractions .20 .80
506f068e 1572@item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
d77de738
ML
1573@end multitable
1574
d77de738 1575@item @emph{Reference}:
506f068e 1576@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
d77de738
ML
1577@end table
1578
1579
1580
506f068e
TB
1581@node omp_get_device_num
1582@subsection @code{omp_get_device_num} -- Return device number of current device
d77de738
ML
1583@table @asis
1584@item @emph{Description}:
506f068e
TB
1585This function returns a device number that represents the device that the
1586current thread is executing on. For OpenMP 5.0, this must be equal to the
1587value returned by the @code{omp_get_initial_device} function when called
1588from the host.
d77de738 1589
506f068e 1590@item @emph{C/C++}
d77de738 1591@multitable @columnfractions .20 .80
506f068e 1592@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
d77de738
ML
1593@end multitable
1594
1595@item @emph{Fortran}:
506f068e
TB
1596@multitable @columnfractions .20 .80
1597@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
d77de738
ML
1598@end multitable
1599
1600@item @emph{See also}:
506f068e 1601@ref{omp_get_initial_device}
d77de738
ML
1602
1603@item @emph{Reference}:
506f068e 1604@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
d77de738
ML
1605@end table
1606
1607
1608
506f068e
TB
1609@node omp_is_initial_device
1610@subsection @code{omp_is_initial_device} -- Whether executing on the host device
d77de738
ML
1611@table @asis
1612@item @emph{Description}:
506f068e
TB
1613This function returns @code{true} if currently running on the host device,
1614@code{false} otherwise. Here, @code{true} and @code{false} represent
1615their language-specific counterparts.
d77de738 1616
506f068e 1617@item @emph{C/C++}:
d77de738 1618@multitable @columnfractions .20 .80
506f068e 1619@item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
d77de738
ML
1620@end multitable
1621
1622@item @emph{Fortran}:
1623@multitable @columnfractions .20 .80
506f068e 1624@item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
d77de738
ML
1625@end multitable
1626
d77de738 1627@item @emph{Reference}:
506f068e 1628@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
d77de738
ML
1629@end table
1630
1631
1632
506f068e
TB
1633@node omp_get_initial_device
1634@subsection @code{omp_get_initial_device} -- Return device number of initial device
d77de738
ML
1635@table @asis
1636@item @emph{Description}:
506f068e
TB
1637This function returns a device number that represents the host device.
1638For OpenMP 5.1, this must be equal to the value returned by the
1639@code{omp_get_num_devices} function.
d77de738 1640
506f068e 1641@item @emph{C/C++}
d77de738 1642@multitable @columnfractions .20 .80
506f068e 1643@item @emph{Prototype}: @tab @code{int omp_get_initial_device(void);}
d77de738
ML
1644@end multitable
1645
1646@item @emph{Fortran}:
1647@multitable @columnfractions .20 .80
506f068e 1648@item @emph{Interface}: @tab @code{integer function omp_get_initial_device()}
d77de738
ML
1649@end multitable
1650
1651@item @emph{See also}:
506f068e 1652@ref{omp_get_num_devices}
d77de738
ML
1653
1654@item @emph{Reference}:
506f068e 1655@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.35.
d77de738
ML
1656@end table
1657
1658
1659
506f068e
TB
1660@c @node Device Memory Routines
1661@c @section Device Memory Routines
1662@c
1663@c Routines related to memory allocation and managing corresponding
1664@c pointers on devices. They have C linkage and do not throw exceptions.
1665@c
1666@c @menu
1667@c * omp_target_alloc:: <fixme>
1668@c * omp_target_free:: <fixme>
1669@c * omp_target_is_present:: <fixme>
1670@c * omp_target_is_accessible:: <fixme>
1671@c * omp_target_memcpy:: <fixme>
1672@c * omp_target_memcpy_rect:: <fixme>
1673@c * omp_target_memcpy_async:: <fixme>
1674@c * omp_target_memcpy_rect_async:: <fixme>
1675@c * omp_target_associate_ptr:: <fixme>
1676@c * omp_target_disassociate_ptr:: <fixme>
1677@c * omp_get_mapped_ptr:: <fixme>
1678@c @end menu
1679
1680@node Lock Routines
1681@section Lock Routines
1682
1683Initialize, set, test, unset and destroy simple and nested locks.
1684The routines have C linkage and do not throw exceptions.
1685
1686@menu
1687* omp_init_lock:: Initialize simple lock
1688* omp_init_nest_lock:: Initialize nested lock
1689@c * omp_init_lock_with_hint:: <fixme>
1690@c * omp_init_nest_lock_with_hint:: <fixme>
1691* omp_destroy_lock:: Destroy simple lock
1692* omp_destroy_nest_lock:: Destroy nested lock
1693* omp_set_lock:: Wait for and set simple lock
1694* omp_set_nest_lock:: Wait for and set simple lock
1695* omp_unset_lock:: Unset simple lock
1696* omp_unset_nest_lock:: Unset nested lock
1697* omp_test_lock:: Test and set simple lock if available
1698* omp_test_nest_lock:: Test and set nested lock if available
1699@end menu
1700
1701
1702
d77de738 1703@node omp_init_lock
506f068e 1704@subsection @code{omp_init_lock} -- Initialize simple lock
d77de738
ML
1705@table @asis
1706@item @emph{Description}:
1707Initialize a simple lock. After initialization, the lock is in
1708an unlocked state.
1709
1710@item @emph{C/C++}:
1711@multitable @columnfractions .20 .80
1712@item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1713@end multitable
1714
1715@item @emph{Fortran}:
1716@multitable @columnfractions .20 .80
1717@item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1718@item @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1719@end multitable
1720
1721@item @emph{See also}:
1722@ref{omp_destroy_lock}
1723
1724@item @emph{Reference}:
1725@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1726@end table
1727
1728
1729
506f068e
TB
1730@node omp_init_nest_lock
1731@subsection @code{omp_init_nest_lock} -- Initialize nested lock
d77de738
ML
1732@table @asis
1733@item @emph{Description}:
506f068e
TB
1734Initialize a nested lock. After initialization, the lock is in
1735an unlocked state and the nesting count is set to zero.
d77de738
ML
1736
1737@item @emph{C/C++}:
1738@multitable @columnfractions .20 .80
506f068e 1739@item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
d77de738
ML
1740@end multitable
1741
1742@item @emph{Fortran}:
1743@multitable @columnfractions .20 .80
506f068e
TB
1744@item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1745@item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
d77de738
ML
1746@end multitable
1747
1748@item @emph{See also}:
506f068e 1749@ref{omp_destroy_nest_lock}
d77de738 1750
506f068e
TB
1751@item @emph{Reference}:
1752@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
d77de738
ML
1753@end table
1754
1755
1756
506f068e
TB
1757@node omp_destroy_lock
1758@subsection @code{omp_destroy_lock} -- Destroy simple lock
d77de738
ML
1759@table @asis
1760@item @emph{Description}:
506f068e
TB
1761Destroy a simple lock. In order to be destroyed, a simple lock must be
1762in the unlocked state.
d77de738
ML
1763
1764@item @emph{C/C++}:
1765@multitable @columnfractions .20 .80
506f068e 1766@item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
d77de738
ML
1767@end multitable
1768
1769@item @emph{Fortran}:
1770@multitable @columnfractions .20 .80
506f068e 1771@item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
d77de738
ML
1772@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1773@end multitable
1774
1775@item @emph{See also}:
506f068e 1776@ref{omp_init_lock}
d77de738
ML
1777
1778@item @emph{Reference}:
506f068e 1779@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
d77de738
ML
1780@end table
1781
1782
1783
506f068e
TB
1784@node omp_destroy_nest_lock
1785@subsection @code{omp_destroy_nest_lock} -- Destroy nested lock
d77de738
ML
1786@table @asis
1787@item @emph{Description}:
506f068e
TB
1788Destroy a nested lock. In order to be destroyed, a nested lock must be
1789in the unlocked state and its nesting count must equal zero.
d77de738
ML
1790
1791@item @emph{C/C++}:
1792@multitable @columnfractions .20 .80
506f068e 1793@item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
d77de738
ML
1794@end multitable
1795
1796@item @emph{Fortran}:
1797@multitable @columnfractions .20 .80
506f068e
TB
1798@item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1799@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
d77de738
ML
1800@end multitable
1801
1802@item @emph{See also}:
506f068e 1803@ref{omp_init_lock}
d77de738
ML
1804
1805@item @emph{Reference}:
506f068e 1806@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
d77de738
ML
1807@end table
1808
1809
1810
506f068e
TB
1811@node omp_set_lock
1812@subsection @code{omp_set_lock} -- Wait for and set simple lock
d77de738
ML
1813@table @asis
1814@item @emph{Description}:
506f068e
TB
1815Before setting a simple lock, the lock variable must be initialized by
1816@code{omp_init_lock}. The calling thread is blocked until the lock
1817is available. If the lock is already held by the current thread,
1818a deadlock occurs.
d77de738
ML
1819
1820@item @emph{C/C++}:
1821@multitable @columnfractions .20 .80
506f068e 1822@item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
d77de738
ML
1823@end multitable
1824
1825@item @emph{Fortran}:
1826@multitable @columnfractions .20 .80
506f068e 1827@item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
d77de738
ML
1828@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1829@end multitable
1830
1831@item @emph{See also}:
506f068e 1832@ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
d77de738
ML
1833
1834@item @emph{Reference}:
506f068e 1835@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
d77de738
ML
1836@end table
1837
1838
1839
d77de738 1840@node omp_set_nest_lock
506f068e 1841@subsection @code{omp_set_nest_lock} -- Wait for and set nested lock
d77de738
ML
1842@table @asis
1843@item @emph{Description}:
1844Before setting a nested lock, the lock variable must be initialized by
1845@code{omp_init_nest_lock}. The calling thread is blocked until the lock
1846is available. If the lock is already held by the current thread, the
1847nesting count for the lock is incremented.
1848
1849@item @emph{C/C++}:
1850@multitable @columnfractions .20 .80
1851@item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1852@end multitable
1853
1854@item @emph{Fortran}:
1855@multitable @columnfractions .20 .80
1856@item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1857@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1858@end multitable
1859
1860@item @emph{See also}:
1861@ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1862
1863@item @emph{Reference}:
1864@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1865@end table
1866
1867
1868
506f068e
TB
1869@node omp_unset_lock
1870@subsection @code{omp_unset_lock} -- Unset simple lock
d77de738
ML
1871@table @asis
1872@item @emph{Description}:
506f068e
TB
1873A simple lock about to be unset must have been locked by @code{omp_set_lock}
1874or @code{omp_test_lock} before. In addition, the lock must be held by the
1875thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one
1876or more threads attempted to set the lock before, one of them is chosen to,
1877again, set the lock to itself.
d77de738
ML
1878
1879@item @emph{C/C++}:
1880@multitable @columnfractions .20 .80
506f068e 1881@item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
d77de738
ML
1882@end multitable
1883
1884@item @emph{Fortran}:
1885@multitable @columnfractions .20 .80
506f068e
TB
1886@item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1887@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
d77de738
ML
1888@end multitable
1889
d77de738 1890@item @emph{See also}:
506f068e 1891@ref{omp_set_lock}, @ref{omp_test_lock}
d77de738
ML
1892
1893@item @emph{Reference}:
506f068e 1894@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
d77de738
ML
1895@end table
1896
1897
1898
1899@node omp_unset_nest_lock
506f068e 1900@subsection @code{omp_unset_nest_lock} -- Unset nested lock
d77de738
ML
1901@table @asis
1902@item @emph{Description}:
1903A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1904or @code{omp_test_nested_lock} before. In addition, the lock must be held by the
1905thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the
1906lock becomes unlocked. If one ore more threads attempted to set the lock before,
1907one of them is chosen to, again, set the lock to itself.
1908
1909@item @emph{C/C++}:
1910@multitable @columnfractions .20 .80
1911@item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1912@end multitable
1913
1914@item @emph{Fortran}:
1915@multitable @columnfractions .20 .80
1916@item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1917@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1918@end multitable
1919
1920@item @emph{See also}:
1921@ref{omp_set_nest_lock}
1922
1923@item @emph{Reference}:
1924@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1925@end table
1926
1927
1928
506f068e
TB
1929@node omp_test_lock
1930@subsection @code{omp_test_lock} -- Test and set simple lock if available
d77de738
ML
1931@table @asis
1932@item @emph{Description}:
506f068e
TB
1933Before setting a simple lock, the lock variable must be initialized by
1934@code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1935does not block if the lock is not available. This function returns
1936@code{true} upon success, @code{false} otherwise. Here, @code{true} and
1937@code{false} represent their language-specific counterparts.
d77de738
ML
1938
1939@item @emph{C/C++}:
1940@multitable @columnfractions .20 .80
506f068e 1941@item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
d77de738
ML
1942@end multitable
1943
1944@item @emph{Fortran}:
1945@multitable @columnfractions .20 .80
506f068e
TB
1946@item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1947@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1948@end multitable
1949
1950@item @emph{See also}:
1951@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1952
1953@item @emph{Reference}:
1954@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1955@end table
1956
1957
1958
1959@node omp_test_nest_lock
1960@subsection @code{omp_test_nest_lock} -- Test and set nested lock if available
1961@table @asis
1962@item @emph{Description}:
1963Before setting a nested lock, the lock variable must be initialized by
1964@code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock},
1965@code{omp_test_nest_lock} does not block if the lock is not available.
1966If the lock is already held by the current thread, the new nesting count
1967is returned. Otherwise, the return value equals zero.
1968
1969@item @emph{C/C++}:
1970@multitable @columnfractions .20 .80
1971@item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1972@end multitable
1973
1974@item @emph{Fortran}:
1975@multitable @columnfractions .20 .80
1976@item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
d77de738
ML
1977@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1978@end multitable
1979
506f068e 1980
d77de738 1981@item @emph{See also}:
506f068e 1982@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
d77de738
ML
1983
1984@item @emph{Reference}:
506f068e 1985@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
d77de738
ML
1986@end table
1987
1988
1989
506f068e
TB
1990@node Timing Routines
1991@section Timing Routines
1992
1993Portable, thread-based, wall clock timer.
1994The routines have C linkage and do not throw exceptions.
1995
1996@menu
1997* omp_get_wtick:: Get timer precision.
1998* omp_get_wtime:: Elapsed wall clock time.
1999@end menu
2000
2001
2002
d77de738 2003@node omp_get_wtick
506f068e 2004@subsection @code{omp_get_wtick} -- Get timer precision
d77de738
ML
2005@table @asis
2006@item @emph{Description}:
2007Gets the timer precision, i.e., the number of seconds between two
2008successive clock ticks.
2009
2010@item @emph{C/C++}:
2011@multitable @columnfractions .20 .80
2012@item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
2013@end multitable
2014
2015@item @emph{Fortran}:
2016@multitable @columnfractions .20 .80
2017@item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
2018@end multitable
2019
2020@item @emph{See also}:
2021@ref{omp_get_wtime}
2022
2023@item @emph{Reference}:
2024@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
2025@end table
2026
2027
2028
2029@node omp_get_wtime
506f068e 2030@subsection @code{omp_get_wtime} -- Elapsed wall clock time
d77de738
ML
2031@table @asis
2032@item @emph{Description}:
2033Elapsed wall clock time in seconds. The time is measured per thread, no
2034guarantee can be made that two distinct threads measure the same time.
2035Time is measured from some "time in the past", which is an arbitrary time
2036guaranteed not to change during the execution of the program.
2037
2038@item @emph{C/C++}:
2039@multitable @columnfractions .20 .80
2040@item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
2041@end multitable
2042
2043@item @emph{Fortran}:
2044@multitable @columnfractions .20 .80
2045@item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
2046@end multitable
2047
2048@item @emph{See also}:
2049@ref{omp_get_wtick}
2050
2051@item @emph{Reference}:
2052@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
2053@end table
2054
2055
2056
506f068e
TB
2057@node Event Routine
2058@section Event Routine
2059
2060Support for event objects.
2061The routine has C linkage and do not throw exceptions.
2062
2063@menu
2064* omp_fulfill_event:: Fulfill and destroy an OpenMP event.
2065@end menu
2066
2067
2068
d77de738 2069@node omp_fulfill_event
506f068e 2070@subsection @code{omp_fulfill_event} -- Fulfill and destroy an OpenMP event
d77de738
ML
2071@table @asis
2072@item @emph{Description}:
2073Fulfill the event associated with the event handle argument. Currently, it
2074is only used to fulfill events generated by detach clauses on task
2075constructs - the effect of fulfilling the event is to allow the task to
2076complete.
2077
2078The result of calling @code{omp_fulfill_event} with an event handle other
2079than that generated by a detach clause is undefined. Calling it with an
2080event handle that has already been fulfilled is also undefined.
2081
2082@item @emph{C/C++}:
2083@multitable @columnfractions .20 .80
2084@item @emph{Prototype}: @tab @code{void omp_fulfill_event(omp_event_handle_t event);}
2085@end multitable
2086
2087@item @emph{Fortran}:
2088@multitable @columnfractions .20 .80
2089@item @emph{Interface}: @tab @code{subroutine omp_fulfill_event(event)}
2090@item @tab @code{integer (kind=omp_event_handle_kind) :: event}
2091@end multitable
2092
2093@item @emph{Reference}:
2094@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.5.1.
2095@end table
2096
2097
2098
506f068e
TB
2099@c @node Interoperability Routines
2100@c @section Interoperability Routines
2101@c
2102@c Routines to obtain properties from an @code{omp_interop_t} object.
2103@c They have C linkage and do not throw exceptions.
2104@c
2105@c @menu
2106@c * omp_get_num_interop_properties:: <fixme>
2107@c * omp_get_interop_int:: <fixme>
2108@c * omp_get_interop_ptr:: <fixme>
2109@c * omp_get_interop_str:: <fixme>
2110@c * omp_get_interop_name:: <fixme>
2111@c * omp_get_interop_type_desc:: <fixme>
2112@c * omp_get_interop_rc_desc:: <fixme>
2113@c @end menu
2114
2115@c @node Memory Management Routines
2116@c @section Memory Management Routines
2117@c
2118@c Routines to manage and allocate memory on the current device.
2119@c They have C linkage and do not throw exceptions.
2120@c
2121@c @menu
2122@c * omp_init_allocator:: <fixme>
2123@c * omp_destroy_allocator:: <fixme>
2124@c * omp_set_default_allocator:: <fixme>
2125@c * omp_get_default_allocator:: <fixme>
2126@c * omp_alloc:: <fixme>
2127@c * omp_aligned_alloc:: <fixme>
2128@c * omp_free:: <fixme>
2129@c * omp_calloc:: <fixme>
2130@c * omp_aligned_calloc:: <fixme>
2131@c * omp_realloc:: <fixme>
2132@c * omp_get_memspace_num_resources:: <fixme>/TR11
2133@c * omp_get_submemspace:: <fixme>/TR11
2134@c @end menu
2135
2136@c @node Tool Control Routine
2137@c
2138@c FIXME
2139
2140@c @node Environment Display Routine
2141@c @section Environment Display Routine
2142@c
2143@c Routine to display the OpenMP number and the initial value of ICVs.
2144@c It has C linkage and do not throw exceptions.
2145@c
2146@c menu
2147@c * omp_display_env:: <fixme>
2148@c end menu
2149
d77de738
ML
2150@c ---------------------------------------------------------------------
2151@c OpenMP Environment Variables
2152@c ---------------------------------------------------------------------
2153
2154@node Environment Variables
2155@chapter OpenMP Environment Variables
2156
2157The environment variables which beginning with @env{OMP_} are defined by
2cd0689a
TB
2158section 4 of the OpenMP specification in version 4.5 or in a later version
2159of the specification, while those beginning with @env{GOMP_} are GNU extensions.
2160Most @env{OMP_} environment variables have an associated internal control
2161variable (ICV).
2162
2163For any OpenMP environment variable that sets an ICV and is neither
2164@code{OMP_DEFAULT_DEVICE} nor has global ICV scope, associated
2165device-specific environment variables exist. For them, the environment
2166variable without suffix affects the host. The suffix @code{_DEV_} followed
2167by a non-negative device number less that the number of available devices sets
2168the ICV for the corresponding device. The suffix @code{_DEV} sets the ICV
2169of all non-host devices for which a device-specific corresponding environment
2170variable has not been set while the @code{_ALL} suffix sets the ICV of all
2171host and non-host devices for which a more specific corresponding environment
2172variable is not set.
d77de738
ML
2173
2174@menu
73a0d3bf
TB
2175* OMP_ALLOCATOR:: Set the default allocator
2176* OMP_AFFINITY_FORMAT:: Set the format string used for affinity display
d77de738 2177* OMP_CANCELLATION:: Set whether cancellation is activated
73a0d3bf 2178* OMP_DISPLAY_AFFINITY:: Display thread affinity information
d77de738
ML
2179* OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
2180* OMP_DEFAULT_DEVICE:: Set the device used in target regions
2181* OMP_DYNAMIC:: Dynamic adjustment of threads
2182* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
2183* OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
2184* OMP_NESTED:: Nested parallel regions
2185* OMP_NUM_TEAMS:: Specifies the number of teams to use by teams region
2186* OMP_NUM_THREADS:: Specifies the number of threads to use
0b9bd33d
JJ
2187* OMP_PROC_BIND:: Whether threads may be moved between CPUs
2188* OMP_PLACES:: Specifies on which CPUs the threads should be placed
d77de738
ML
2189* OMP_STACKSIZE:: Set default thread stack size
2190* OMP_SCHEDULE:: How threads are scheduled
2191* OMP_TARGET_OFFLOAD:: Controls offloading behaviour
2192* OMP_TEAMS_THREAD_LIMIT:: Set the maximum number of threads imposed by teams
2193* OMP_THREAD_LIMIT:: Set the maximum number of threads
2194* OMP_WAIT_POLICY:: How waiting threads are handled
2195* GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
2196* GOMP_DEBUG:: Enable debugging output
2197* GOMP_STACKSIZE:: Set default thread stack size
2198* GOMP_SPINCOUNT:: Set the busy-wait spin count
2199* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
2200@end menu
2201
2202
73a0d3bf
TB
2203@node OMP_ALLOCATOR
2204@section @env{OMP_ALLOCATOR} -- Set the default allocator
2205@cindex Environment Variable
2206@table @asis
2cd0689a
TB
2207@item @emph{ICV:} @var{available-devices-var}
2208@item @emph{Scope:} data environment
73a0d3bf
TB
2209@item @emph{Description}:
2210Sets the default allocator that is used when no allocator has been specified
2211in the @code{allocate} or @code{allocator} clause or if an OpenMP memory
2212routine is invoked with the @code{omp_null_allocator} allocator.
2213If unset, @code{omp_default_mem_alloc} is used.
2214
2215The value can either be a predefined allocator or a predefined memory space
2216or a predefined memory space followed by a colon and a comma-separated list
2217of memory trait and value pairs, separated by @code{=}.
2218
2cd0689a
TB
2219Note: The corresponding device environment variables are currently not
2220supported. Therefore, the non-host @var{def-allocator-var} ICVs are always
2221initialized to @code{omp_default_mem_alloc}. However, on all devices,
2222the @code{omp_set_default_allocator} API routine can be used to change
2223value.
2224
73a0d3bf 2225@multitable @columnfractions .45 .45
a85a106c 2226@headitem Predefined allocators @tab Associated predefined memory spaces
73a0d3bf
TB
2227@item omp_default_mem_alloc @tab omp_default_mem_space
2228@item omp_large_cap_mem_alloc @tab omp_large_cap_mem_space
2229@item omp_const_mem_alloc @tab omp_const_mem_space
2230@item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
2231@item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
2232@item omp_cgroup_mem_alloc @tab --
2233@item omp_pteam_mem_alloc @tab --
2234@item omp_thread_mem_alloc @tab --
2235@end multitable
2236
a85a106c
TB
2237The predefined allocators use the default values for the traits,
2238as listed below. Except that the last three allocators have the
2239@code{access} trait set to @code{cgroup}, @code{pteam}, and
2240@code{thread}, respectively.
2241
2242@multitable @columnfractions .25 .40 .25
2243@headitem Trait @tab Allowed values @tab Default value
73a0d3bf
TB
2244@item @code{sync_hint} @tab @code{contended}, @code{uncontended},
2245 @code{serialized}, @code{private}
a85a106c 2246 @tab @code{contended}
73a0d3bf 2247@item @code{alignment} @tab Positive integer being a power of two
a85a106c 2248 @tab 1 byte
73a0d3bf
TB
2249@item @code{access} @tab @code{all}, @code{cgroup},
2250 @code{pteam}, @code{thread}
a85a106c 2251 @tab @code{all}
73a0d3bf 2252@item @code{pool_size} @tab Positive integer
a85a106c 2253 @tab See @ref{Memory allocation}
73a0d3bf
TB
2254@item @code{fallback} @tab @code{default_mem_fb}, @code{null_fb},
2255 @code{abort_fb}, @code{allocator_fb}
a85a106c 2256 @tab See below
73a0d3bf 2257@item @code{fb_data} @tab @emph{unsupported as it needs an allocator handle}
a85a106c 2258 @tab (none)
73a0d3bf 2259@item @code{pinned} @tab @code{true}, @code{false}
a85a106c 2260 @tab @code{false}
73a0d3bf
TB
2261@item @code{partition} @tab @code{environment}, @code{nearest},
2262 @code{blocked}, @code{interleaved}
a85a106c 2263 @tab @code{environment}
73a0d3bf
TB
2264@end multitable
2265
a85a106c
TB
2266For the @code{fallback} trait, the default value is @code{null_fb} for the
2267@code{omp_default_mem_alloc} allocator and any allocator that is associated
2268with device memory; for all other other allocators, it is @code{default_mem_fb}
2269by default.
2270
73a0d3bf
TB
2271Examples:
2272@smallexample
2273OMP_ALLOCATOR=omp_high_bw_mem_alloc
2274OMP_ALLOCATOR=omp_large_cap_mem_space
506f068e 2275OMP_ALLOCATOR=omp_low_lat_mem_space:pinned=true,partition=nearest
73a0d3bf
TB
2276@end smallexample
2277
a85a106c
TB
2278@item @emph{See also}:
2279@ref{Memory allocation}
73a0d3bf
TB
2280
2281@item @emph{Reference}:
2282@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.21
2283@end table
2284
2285
2286
2287@node OMP_AFFINITY_FORMAT
2288@section @env{OMP_AFFINITY_FORMAT} -- Set the format string used for affinity display
2289@cindex Environment Variable
2290@table @asis
2cd0689a
TB
2291@item @emph{ICV:} @var{affinity-format-var}
2292@item @emph{Scope:} device
73a0d3bf
TB
2293@item @emph{Description}:
2294Sets the format string used when displaying OpenMP thread affinity information.
2295Special values are output using @code{%} followed by an optional size
2296specification and then either the single-character field type or its long
2297name enclosed in curly braces; using @code{%%} will display a literal percent.
2298The size specification consists of an optional @code{0.} or @code{.} followed
450b05ce 2299by a positive integer, specifying the minimal width of the output. With
73a0d3bf
TB
2300@code{0.} and numerical values, the output is padded with zeros on the left;
2301with @code{.}, the output is padded by spaces on the left; otherwise, the
2302output is padded by spaces on the right. If unset, the value is
2303``@code{level %L thread %i affinity %A}''.
2304
2305Supported field types are:
2306
2307@multitable @columnfractions .10 .25 .60
2308@item t @tab team_num @tab value returned by @code{omp_get_team_num}
2309@item T @tab num_teams @tab value returned by @code{omp_get_num_teams}
2310@item L @tab nesting_level @tab value returned by @code{omp_get_level}
2311@item n @tab thread_num @tab value returned by @code{omp_get_thread_num}
2312@item N @tab num_threads @tab value returned by @code{omp_get_num_threads}
2313@item a @tab ancestor_tnum
2314 @tab value returned by
2315 @code{omp_get_ancestor_thread_num(omp_get_level()-1)}
2316@item H @tab host @tab name of the host that executes the thread
450b05ce
TB
2317@item P @tab process_id @tab process identifier
2318@item i @tab native_thread_id @tab native thread identifier
73a0d3bf
TB
2319@item A @tab thread_affinity
2320 @tab comma separated list of integer values or ranges, representing the
2321 processors on which a process might execute, subject to affinity
2322 mechanisms
2323@end multitable
2324
2325For instance, after setting
2326
2327@smallexample
2328OMP_AFFINITY_FORMAT="%0.2a!%n!%.4L!%N;%.2t;%0.2T;%@{team_num@};%@{num_teams@};%A"
2329@end smallexample
2330
2331with either @code{OMP_DISPLAY_AFFINITY} being set or when calling
2332@code{omp_display_affinity} with @code{NULL} or an empty string, the program
2333might display the following:
2334
2335@smallexample
233600!0! 1!4; 0;01;0;1;0-11
233700!3! 1!4; 0;01;0;1;0-11
233800!2! 1!4; 0;01;0;1;0-11
233900!1! 1!4; 0;01;0;1;0-11
2340@end smallexample
2341
2342@item @emph{See also}:
2343@ref{OMP_DISPLAY_AFFINITY}
2344
2345@item @emph{Reference}:
2346@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.14
2347@end table
2348
2349
2350
d77de738
ML
2351@node OMP_CANCELLATION
2352@section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
2353@cindex Environment Variable
2354@table @asis
2cd0689a
TB
2355@item @emph{ICV:} @var{cancel-var}
2356@item @emph{Scope:} global
d77de738
ML
2357@item @emph{Description}:
2358If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or
2359if unset, cancellation is disabled and the @code{cancel} construct is ignored.
2360
2361@item @emph{See also}:
2362@ref{omp_get_cancellation}
2363
2364@item @emph{Reference}:
2365@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
2366@end table
2367
2368
2369
73a0d3bf
TB
2370@node OMP_DISPLAY_AFFINITY
2371@section @env{OMP_DISPLAY_AFFINITY} -- Display thread affinity information
2372@cindex Environment Variable
2373@table @asis
2cd0689a
TB
2374@item @emph{ICV:} @var{display-affinity-var}
2375@item @emph{Scope:} global
73a0d3bf
TB
2376@item @emph{Description}:
2377If set to @code{FALSE} or if unset, affinity displaying is disabled.
2378If set to @code{TRUE}, the runtime will display affinity information about
2379OpenMP threads in a parallel region upon entering the region and every time
2380any change occurs.
2381
2382@item @emph{See also}:
2383@ref{OMP_AFFINITY_FORMAT}
2384
2385@item @emph{Reference}:
2386@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.13
2387@end table
2388
2389
2390
2391
d77de738
ML
2392@node OMP_DISPLAY_ENV
2393@section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
2394@cindex Environment Variable
2395@table @asis
2cd0689a
TB
2396@item @emph{ICV:} none
2397@item @emph{Scope:} not applicable
d77de738
ML
2398@item @emph{Description}:
2399If set to @code{TRUE}, the OpenMP version number and the values
2400associated with the OpenMP environment variables are printed to @code{stderr}.
2401If set to @code{VERBOSE}, it additionally shows the value of the environment
2402variables which are GNU extensions. If undefined or set to @code{FALSE},
2403this information will not be shown.
2404
2405
2406@item @emph{Reference}:
2407@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
2408@end table
2409
2410
2411
2412@node OMP_DEFAULT_DEVICE
2413@section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
2414@cindex Environment Variable
2415@table @asis
2cd0689a
TB
2416@item @emph{ICV:} @var{default-device-var}
2417@item @emph{Scope:} data environment
d77de738
ML
2418@item @emph{Description}:
2419Set to choose the device which is used in a @code{target} region, unless the
2420value is overridden by @code{omp_set_default_device} or by a @code{device}
2421clause. The value shall be the nonnegative device number. If no device with
2422the given device number exists, the code is executed on the host. If unset,
18c8b56c
TB
2423@env{OMP_TARGET_OFFLOAD} is @code{mandatory} and no non-host devices are
2424available, it is set to @code{omp_invalid_device}. Otherwise, if unset,
d77de738
ML
2425device number 0 will be used.
2426
2427
2428@item @emph{See also}:
2429@ref{omp_get_default_device}, @ref{omp_set_default_device},
2430
2431@item @emph{Reference}:
2432@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
2433@end table
2434
2435
2436
2437@node OMP_DYNAMIC
2438@section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
2439@cindex Environment Variable
2440@table @asis
2cd0689a
TB
2441@item @emph{ICV:} @var{dyn-var}
2442@item @emph{Scope:} global
d77de738
ML
2443@item @emph{Description}:
2444Enable or disable the dynamic adjustment of the number of threads
2445within a team. The value of this environment variable shall be
2446@code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is
2447disabled by default.
2448
2449@item @emph{See also}:
2450@ref{omp_set_dynamic}
2451
2452@item @emph{Reference}:
2453@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
2454@end table
2455
2456
2457
2458@node OMP_MAX_ACTIVE_LEVELS
2459@section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
2460@cindex Environment Variable
2461@table @asis
2cd0689a
TB
2462@item @emph{ICV:} @var{max-active-levels-var}
2463@item @emph{Scope:} data environment
d77de738
ML
2464@item @emph{Description}:
2465Specifies the initial value for the maximum number of nested parallel
2466regions. The value of this variable shall be a positive integer.
2467If undefined, then if @env{OMP_NESTED} is defined and set to true, or
2468if @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND} are defined and set to
2469a list with more than one item, the maximum number of nested parallel
2470regions will be initialized to the largest number supported, otherwise
2471it will be set to one.
2472
2473@item @emph{See also}:
2cd0689a
TB
2474@ref{omp_set_max_active_levels}, @ref{OMP_NESTED}, @ref{OMP_PROC_BIND},
2475@ref{OMP_NUM_THREADS}
2476
d77de738
ML
2477
2478@item @emph{Reference}:
2479@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
2480@end table
2481
2482
2483
2484@node OMP_MAX_TASK_PRIORITY
2485@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
2486number that can be set for a task.
2487@cindex Environment Variable
2488@table @asis
2cd0689a
TB
2489@item @emph{ICV:} @var{max-task-priority-var}
2490@item @emph{Scope:} global
d77de738
ML
2491@item @emph{Description}:
2492Specifies the initial value for the maximum priority value that can be
2493set for a task. The value of this variable shall be a non-negative
2494integer, and zero is allowed. If undefined, the default priority is
24950.
2496
2497@item @emph{See also}:
2498@ref{omp_get_max_task_priority}
2499
2500@item @emph{Reference}:
2501@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
2502@end table
2503
2504
2505
2506@node OMP_NESTED
2507@section @env{OMP_NESTED} -- Nested parallel regions
2508@cindex Environment Variable
2509@cindex Implementation specific setting
2510@table @asis
2cd0689a
TB
2511@item @emph{ICV:} @var{max-active-levels-var}
2512@item @emph{Scope:} data environment
d77de738
ML
2513@item @emph{Description}:
2514Enable or disable nested parallel regions, i.e., whether team members
2515are allowed to create new teams. The value of this environment variable
2516shall be @code{TRUE} or @code{FALSE}. If set to @code{TRUE}, the number
2517of maximum active nested regions supported will by default be set to the
2518maximum supported, otherwise it will be set to one. If
2519@env{OMP_MAX_ACTIVE_LEVELS} is defined, its setting will override this
2520setting. If both are undefined, nested parallel regions are enabled if
2521@env{OMP_NUM_THREADS} or @env{OMP_PROC_BINDS} are defined to a list with
2522more than one item, otherwise they are disabled by default.
2523
2cd0689a
TB
2524Note that the @code{OMP_NESTED} environment variable was deprecated in
2525the OpenMP specification 5.2 in favor of @code{OMP_MAX_ACTIVE_LEVELS}.
2526
d77de738 2527@item @emph{See also}:
2cd0689a
TB
2528@ref{omp_set_max_active_levels}, @ref{omp_set_nested},
2529@ref{OMP_MAX_ACTIVE_LEVELS}
d77de738
ML
2530
2531@item @emph{Reference}:
2532@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
2533@end table
2534
2535
2536
2537@node OMP_NUM_TEAMS
2538@section @env{OMP_NUM_TEAMS} -- Specifies the number of teams to use by teams region
2539@cindex Environment Variable
2540@table @asis
2cd0689a
TB
2541@item @emph{ICV:} @var{nteams-var}
2542@item @emph{Scope:} device
d77de738
ML
2543@item @emph{Description}:
2544Specifies the upper bound for number of teams to use in teams regions
2545without explicit @code{num_teams} clause. The value of this variable shall
2546be a positive integer. If undefined it defaults to 0 which means
2547implementation defined upper bound.
2548
2549@item @emph{See also}:
2550@ref{omp_set_num_teams}
2551
2552@item @emph{Reference}:
2553@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.23
2554@end table
2555
2556
2557
2558@node OMP_NUM_THREADS
2559@section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
2560@cindex Environment Variable
2561@cindex Implementation specific setting
2562@table @asis
2cd0689a
TB
2563@item @emph{ICV:} @var{nthreads-var}
2564@item @emph{Scope:} data environment
d77de738
ML
2565@item @emph{Description}:
2566Specifies the default number of threads to use in parallel regions. The
2567value of this variable shall be a comma-separated list of positive integers;
2568the value specifies the number of threads to use for the corresponding nested
2569level. Specifying more than one item in the list will automatically enable
2570nesting by default. If undefined one thread per CPU is used.
2571
2cd0689a
TB
2572When a list with more than value is specified, it also affects the
2573@var{max-active-levels-var} ICV as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
2574
d77de738 2575@item @emph{See also}:
2cd0689a 2576@ref{omp_set_num_threads}, @ref{OMP_MAX_ACTIVE_LEVELS}
d77de738
ML
2577
2578@item @emph{Reference}:
2579@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
2580@end table
2581
2582
2583
2584@node OMP_PROC_BIND
0b9bd33d 2585@section @env{OMP_PROC_BIND} -- Whether threads may be moved between CPUs
d77de738
ML
2586@cindex Environment Variable
2587@table @asis
2cd0689a
TB
2588@item @emph{ICV:} @var{bind-var}
2589@item @emph{Scope:} data environment
d77de738
ML
2590@item @emph{Description}:
2591Specifies whether threads may be moved between processors. If set to
0b9bd33d 2592@code{TRUE}, OpenMP threads should not be moved; if set to @code{FALSE}
d77de738
ML
2593they may be moved. Alternatively, a comma separated list with the
2594values @code{PRIMARY}, @code{MASTER}, @code{CLOSE} and @code{SPREAD} can
2595be used to specify the thread affinity policy for the corresponding nesting
2596level. With @code{PRIMARY} and @code{MASTER} the worker threads are in the
2597same place partition as the primary thread. With @code{CLOSE} those are
2598kept close to the primary thread in contiguous place partitions. And
2599with @code{SPREAD} a sparse distribution
2600across the place partitions is used. Specifying more than one item in the
2601list will automatically enable nesting by default.
2602
2cd0689a
TB
2603When a list is specified, it also affects the @var{max-active-levels-var} ICV
2604as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
2605
d77de738
ML
2606When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
2607@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
2608
2609@item @emph{See also}:
2cd0689a
TB
2610@ref{omp_get_proc_bind}, @ref{GOMP_CPU_AFFINITY}, @ref{OMP_PLACES},
2611@ref{OMP_MAX_ACTIVE_LEVELS}
d77de738
ML
2612
2613@item @emph{Reference}:
2614@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
2615@end table
2616
2617
2618
2619@node OMP_PLACES
0b9bd33d 2620@section @env{OMP_PLACES} -- Specifies on which CPUs the threads should be placed
d77de738
ML
2621@cindex Environment Variable
2622@table @asis
2cd0689a
TB
2623@item @emph{ICV:} @var{place-partition-var}
2624@item @emph{Scope:} implicit tasks
d77de738
ML
2625@item @emph{Description}:
2626The thread placement can be either specified using an abstract name or by an
2627explicit list of the places. The abstract names @code{threads}, @code{cores},
2628@code{sockets}, @code{ll_caches} and @code{numa_domains} can be optionally
2629followed by a positive number in parentheses, which denotes the how many places
2630shall be created. With @code{threads} each place corresponds to a single
2631hardware thread; @code{cores} to a single core with the corresponding number of
2632hardware threads; with @code{sockets} the place corresponds to a single
2633socket; with @code{ll_caches} to a set of cores that shares the last level
2634cache on the device; and @code{numa_domains} to a set of cores for which their
2635closest memory on the device is the same memory and at a similar distance from
2636the cores. The resulting placement can be shown by setting the
2637@env{OMP_DISPLAY_ENV} environment variable.
2638
2639Alternatively, the placement can be specified explicitly as comma-separated
2640list of places. A place is specified by set of nonnegative numbers in curly
2641braces, denoting the hardware threads. The curly braces can be omitted
2642when only a single number has been specified. The hardware threads
2643belonging to a place can either be specified as comma-separated list of
2644nonnegative thread numbers or using an interval. Multiple places can also be
2645either specified by a comma-separated list of places or by an interval. To
2646specify an interval, a colon followed by the count is placed after
2647the hardware thread number or the place. Optionally, the length can be
2648followed by a colon and the stride number -- otherwise a unit stride is
2649assumed. Placing an exclamation mark (@code{!}) directly before a curly
2650brace or numbers inside the curly braces (excluding intervals) will
2651exclude those hardware threads.
2652
2653For instance, the following specifies the same places list:
2654@code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
2655@code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
2656
2657If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
2658@env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
2659between CPUs following no placement policy.
2660
2661@item @emph{See also}:
2662@ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
2663@ref{OMP_DISPLAY_ENV}
2664
2665@item @emph{Reference}:
2666@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
2667@end table
2668
2669
2670
2671@node OMP_STACKSIZE
2672@section @env{OMP_STACKSIZE} -- Set default thread stack size
2673@cindex Environment Variable
2674@table @asis
2cd0689a
TB
2675@item @emph{ICV:} @var{stacksize-var}
2676@item @emph{Scope:} device
d77de738
ML
2677@item @emph{Description}:
2678Set the default thread stack size in kilobytes, unless the number
2679is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
2680case the size is, respectively, in bytes, kilobytes, megabytes
2681or gigabytes. This is different from @code{pthread_attr_setstacksize}
2682which gets the number of bytes as an argument. If the stack size cannot
2683be set due to system constraints, an error is reported and the initial
2684stack size is left unchanged. If undefined, the stack size is system
2685dependent.
2686
2cd0689a
TB
2687@item @emph{See also}:
2688@ref{GOMP_STACKSIZE}
2689
d77de738
ML
2690@item @emph{Reference}:
2691@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
2692@end table
2693
2694
2695
2696@node OMP_SCHEDULE
2697@section @env{OMP_SCHEDULE} -- How threads are scheduled
2698@cindex Environment Variable
2699@cindex Implementation specific setting
2700@table @asis
2cd0689a
TB
2701@item @emph{ICV:} @var{run-sched-var}
2702@item @emph{Scope:} data environment
d77de738
ML
2703@item @emph{Description}:
2704Allows to specify @code{schedule type} and @code{chunk size}.
2705The value of the variable shall have the form: @code{type[,chunk]} where
2706@code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
2707The optional @code{chunk} size shall be a positive integer. If undefined,
2708dynamic scheduling and a chunk size of 1 is used.
2709
2710@item @emph{See also}:
2711@ref{omp_set_schedule}
2712
2713@item @emph{Reference}:
2714@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
2715@end table
2716
2717
2718
2719@node OMP_TARGET_OFFLOAD
2720@section @env{OMP_TARGET_OFFLOAD} -- Controls offloading behaviour
2721@cindex Environment Variable
2722@cindex Implementation specific setting
2723@table @asis
2cd0689a
TB
2724@item @emph{ICV:} @var{target-offload-var}
2725@item @emph{Scope:} global
d77de738
ML
2726@item @emph{Description}:
2727Specifies the behaviour with regard to offloading code to a device. This
2728variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED}
2729or @code{DEFAULT}.
2730
2731If set to @code{MANDATORY}, the program will terminate with an error if
2732the offload device is not present or is not supported. If set to
2733@code{DISABLED}, then offloading is disabled and all code will run on the
2734host. If set to @code{DEFAULT}, the program will try offloading to the
2735device first, then fall back to running code on the host if it cannot.
2736
2737If undefined, then the program will behave as if @code{DEFAULT} was set.
2738
2739@item @emph{Reference}:
2740@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.17
2741@end table
2742
2743
2744
2745@node OMP_TEAMS_THREAD_LIMIT
2746@section @env{OMP_TEAMS_THREAD_LIMIT} -- Set the maximum number of threads imposed by teams
2747@cindex Environment Variable
2748@table @asis
2cd0689a
TB
2749@item @emph{ICV:} @var{teams-thread-limit-var}
2750@item @emph{Scope:} device
d77de738
ML
2751@item @emph{Description}:
2752Specifies an upper bound for the number of threads to use by each contention
2753group created by a teams construct without explicit @code{thread_limit}
2754clause. The value of this variable shall be a positive integer. If undefined,
2755the value of 0 is used which stands for an implementation defined upper
2756limit.
2757
2758@item @emph{See also}:
2759@ref{OMP_THREAD_LIMIT}, @ref{omp_set_teams_thread_limit}
2760
2761@item @emph{Reference}:
2762@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.24
2763@end table
2764
2765
2766
2767@node OMP_THREAD_LIMIT
2768@section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
2769@cindex Environment Variable
2770@table @asis
2cd0689a
TB
2771@item @emph{ICV:} @var{thread-limit-var}
2772@item @emph{Scope:} data environment
d77de738
ML
2773@item @emph{Description}:
2774Specifies the number of threads to use for the whole program. The
2775value of this variable shall be a positive integer. If undefined,
2776the number of threads is not limited.
2777
2778@item @emph{See also}:
2779@ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
2780
2781@item @emph{Reference}:
2782@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
2783@end table
2784
2785
2786
2787@node OMP_WAIT_POLICY
2788@section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
2789@cindex Environment Variable
2790@table @asis
2791@item @emph{Description}:
2792Specifies whether waiting threads should be active or passive. If
2793the value is @code{PASSIVE}, waiting threads should not consume CPU
2794power while waiting; while the value is @code{ACTIVE} specifies that
2795they should. If undefined, threads wait actively for a short time
2796before waiting passively.
2797
2798@item @emph{See also}:
2799@ref{GOMP_SPINCOUNT}
2800
2801@item @emph{Reference}:
2802@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
2803@end table
2804
2805
2806
2807@node GOMP_CPU_AFFINITY
2808@section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
2809@cindex Environment Variable
2810@table @asis
2811@item @emph{Description}:
2812Binds threads to specific CPUs. The variable should contain a space-separated
2813or comma-separated list of CPUs. This list may contain different kinds of
2814entries: either single CPU numbers in any order, a range of CPUs (M-N)
2815or a range with some stride (M-N:S). CPU numbers are zero based. For example,
2816@code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
2817to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
2818CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
2819and 14 respectively and then start assigning back from the beginning of
2820the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
2821
2822There is no libgomp library routine to determine whether a CPU affinity
2823specification is in effect. As a workaround, language-specific library
2824functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
2825Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
2826environment variable. A defined CPU affinity on startup cannot be changed
2827or disabled during the runtime of the application.
2828
2829If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
2830@env{OMP_PROC_BIND} has a higher precedence. If neither has been set and
2831@env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
2832@code{FALSE}, the host system will handle the assignment of threads to CPUs.
2833
2834@item @emph{See also}:
2835@ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
2836@end table
2837
2838
2839
2840@node GOMP_DEBUG
2841@section @env{GOMP_DEBUG} -- Enable debugging output
2842@cindex Environment Variable
2843@table @asis
2844@item @emph{Description}:
2845Enable debugging output. The variable should be set to @code{0}
2846(disabled, also the default if not set), or @code{1} (enabled).
2847
2848If enabled, some debugging output will be printed during execution.
2849This is currently not specified in more detail, and subject to change.
2850@end table
2851
2852
2853
2854@node GOMP_STACKSIZE
2855@section @env{GOMP_STACKSIZE} -- Set default thread stack size
2856@cindex Environment Variable
2857@cindex Implementation specific setting
2858@table @asis
2859@item @emph{Description}:
2860Set the default thread stack size in kilobytes. This is different from
2861@code{pthread_attr_setstacksize} which gets the number of bytes as an
2862argument. If the stack size cannot be set due to system constraints, an
2863error is reported and the initial stack size is left unchanged. If undefined,
2864the stack size is system dependent.
2865
2866@item @emph{See also}:
2867@ref{OMP_STACKSIZE}
2868
2869@item @emph{Reference}:
2870@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
2871GCC Patches Mailinglist},
2872@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
2873GCC Patches Mailinglist}
2874@end table
2875
2876
2877
2878@node GOMP_SPINCOUNT
2879@section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
2880@cindex Environment Variable
2881@cindex Implementation specific setting
2882@table @asis
2883@item @emph{Description}:
2884Determines how long a threads waits actively with consuming CPU power
2885before waiting passively without consuming CPU power. The value may be
2886either @code{INFINITE}, @code{INFINITY} to always wait actively or an
2887integer which gives the number of spins of the busy-wait loop. The
2888integer may optionally be followed by the following suffixes acting
2889as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
2890million), @code{G} (giga, billion), or @code{T} (tera, trillion).
2891If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
2892300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
289330 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
2894If there are more OpenMP threads than available CPUs, 1000 and 100
2895spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
2896undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
2897or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
2898
2899@item @emph{See also}:
2900@ref{OMP_WAIT_POLICY}
2901@end table
2902
2903
2904
2905@node GOMP_RTEMS_THREAD_POOLS
2906@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
2907@cindex Environment Variable
2908@cindex Implementation specific setting
2909@table @asis
2910@item @emph{Description}:
2911This environment variable is only used on the RTEMS real-time operating system.
2912It determines the scheduler instance specific thread pools. The format for
2913@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
2914@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
2915separated by @code{:} where:
2916@itemize @bullet
2917@item @code{<thread-pool-count>} is the thread pool count for this scheduler
2918instance.
2919@item @code{$<priority>} is an optional priority for the worker threads of a
2920thread pool according to @code{pthread_setschedparam}. In case a priority
2921value is omitted, then a worker thread will inherit the priority of the OpenMP
2922primary thread that created it. The priority of the worker thread is not
2923changed after creation, even if a new OpenMP primary thread using the worker has
2924a different priority.
2925@item @code{@@<scheduler-name>} is the scheduler instance name according to the
2926RTEMS application configuration.
2927@end itemize
2928In case no thread pool configuration is specified for a scheduler instance,
2929then each OpenMP primary thread of this scheduler instance will use its own
2930dynamically allocated thread pool. To limit the worker thread count of the
2931thread pools, each OpenMP primary thread must call @code{omp_set_num_threads}.
2932@item @emph{Example}:
2933Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
2934@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
2935@code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
2936scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
2937one thread pool available. Since no priority is specified for this scheduler
2938instance, the worker thread inherits the priority of the OpenMP primary thread
2939that created it. In the scheduler instance @code{WRK1} there are three thread
2940pools available and their worker threads run at priority four.
2941@end table
2942
2943
2944
2945@c ---------------------------------------------------------------------
2946@c Enabling OpenACC
2947@c ---------------------------------------------------------------------
2948
2949@node Enabling OpenACC
2950@chapter Enabling OpenACC
2951
2952To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
2953flag @option{-fopenacc} must be specified. This enables the OpenACC directive
2954@code{#pragma acc} in C/C++ and @code{!$acc} directives in free form,
2955@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
2956@code{!$} conditional compilation sentinels in free form and @code{c$},
2957@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
2958arranges for automatic linking of the OpenACC runtime library
2959(@ref{OpenACC Runtime Library Routines}).
2960
2961See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
2962
2963A complete description of all OpenACC directives accepted may be found in
2964the @uref{https://www.openacc.org, OpenACC} Application Programming
2965Interface manual, version 2.6.
2966
2967
2968
2969@c ---------------------------------------------------------------------
2970@c OpenACC Runtime Library Routines
2971@c ---------------------------------------------------------------------
2972
2973@node OpenACC Runtime Library Routines
2974@chapter OpenACC Runtime Library Routines
2975
2976The runtime routines described here are defined by section 3 of the OpenACC
2977specifications in version 2.6.
2978They have C linkage, and do not throw exceptions.
2979Generally, they are available only for the host, with the exception of
2980@code{acc_on_device}, which is available for both the host and the
2981acceleration device.
2982
2983@menu
2984* acc_get_num_devices:: Get number of devices for the given device
2985 type.
2986* acc_set_device_type:: Set type of device accelerator to use.
2987* acc_get_device_type:: Get type of device accelerator to be used.
2988* acc_set_device_num:: Set device number to use.
2989* acc_get_device_num:: Get device number to be used.
2990* acc_get_property:: Get device property.
2991* acc_async_test:: Tests for completion of a specific asynchronous
2992 operation.
2993* acc_async_test_all:: Tests for completion of all asynchronous
2994 operations.
2995* acc_wait:: Wait for completion of a specific asynchronous
2996 operation.
2997* acc_wait_all:: Waits for completion of all asynchronous
2998 operations.
2999* acc_wait_all_async:: Wait for completion of all asynchronous
3000 operations.
3001* acc_wait_async:: Wait for completion of asynchronous operations.
3002* acc_init:: Initialize runtime for a specific device type.
3003* acc_shutdown:: Shuts down the runtime for a specific device
3004 type.
3005* acc_on_device:: Whether executing on a particular device
3006* acc_malloc:: Allocate device memory.
3007* acc_free:: Free device memory.
3008* acc_copyin:: Allocate device memory and copy host memory to
3009 it.
3010* acc_present_or_copyin:: If the data is not present on the device,
3011 allocate device memory and copy from host
3012 memory.
3013* acc_create:: Allocate device memory and map it to host
3014 memory.
3015* acc_present_or_create:: If the data is not present on the device,
3016 allocate device memory and map it to host
3017 memory.
3018* acc_copyout:: Copy device memory to host memory.
3019* acc_delete:: Free device memory.
3020* acc_update_device:: Update device memory from mapped host memory.
3021* acc_update_self:: Update host memory from mapped device memory.
3022* acc_map_data:: Map previously allocated device memory to host
3023 memory.
3024* acc_unmap_data:: Unmap device memory from host memory.
3025* acc_deviceptr:: Get device pointer associated with specific
3026 host address.
3027* acc_hostptr:: Get host pointer associated with specific
3028 device address.
3029* acc_is_present:: Indicate whether host variable / array is
3030 present on device.
3031* acc_memcpy_to_device:: Copy host memory to device memory.
3032* acc_memcpy_from_device:: Copy device memory to host memory.
3033* acc_attach:: Let device pointer point to device-pointer target.
3034* acc_detach:: Let device pointer point to host-pointer target.
3035
3036API routines for target platforms.
3037
3038* acc_get_current_cuda_device:: Get CUDA device handle.
3039* acc_get_current_cuda_context::Get CUDA context handle.
3040* acc_get_cuda_stream:: Get CUDA stream handle.
3041* acc_set_cuda_stream:: Set CUDA stream handle.
3042
3043API routines for the OpenACC Profiling Interface.
3044
3045* acc_prof_register:: Register callbacks.
3046* acc_prof_unregister:: Unregister callbacks.
3047* acc_prof_lookup:: Obtain inquiry functions.
3048* acc_register_library:: Library registration.
3049@end menu
3050
3051
3052
3053@node acc_get_num_devices
3054@section @code{acc_get_num_devices} -- Get number of devices for given device type
3055@table @asis
3056@item @emph{Description}
3057This function returns a value indicating the number of devices available
3058for the device type specified in @var{devicetype}.
3059
3060@item @emph{C/C++}:
3061@multitable @columnfractions .20 .80
3062@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
3063@end multitable
3064
3065@item @emph{Fortran}:
3066@multitable @columnfractions .20 .80
3067@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
3068@item @tab @code{integer(kind=acc_device_kind) devicetype}
3069@end multitable
3070
3071@item @emph{Reference}:
3072@uref{https://www.openacc.org, OpenACC specification v2.6}, section
30733.2.1.
3074@end table
3075
3076
3077
3078@node acc_set_device_type
3079@section @code{acc_set_device_type} -- Set type of device accelerator to use.
3080@table @asis
3081@item @emph{Description}
3082This function indicates to the runtime library which device type, specified
3083in @var{devicetype}, to use when executing a parallel or kernels region.
3084
3085@item @emph{C/C++}:
3086@multitable @columnfractions .20 .80
3087@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
3088@end multitable
3089
3090@item @emph{Fortran}:
3091@multitable @columnfractions .20 .80
3092@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
3093@item @tab @code{integer(kind=acc_device_kind) devicetype}
3094@end multitable
3095
3096@item @emph{Reference}:
3097@uref{https://www.openacc.org, OpenACC specification v2.6}, section
30983.2.2.
3099@end table
3100
3101
3102
3103@node acc_get_device_type
3104@section @code{acc_get_device_type} -- Get type of device accelerator to be used.
3105@table @asis
3106@item @emph{Description}
3107This function returns what device type will be used when executing a
3108parallel or kernels region.
3109
3110This function returns @code{acc_device_none} if
3111@code{acc_get_device_type} is called from
3112@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
3113callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling
3114Interface}), that is, if the device is currently being initialized.
3115
3116@item @emph{C/C++}:
3117@multitable @columnfractions .20 .80
3118@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
3119@end multitable
3120
3121@item @emph{Fortran}:
3122@multitable @columnfractions .20 .80
3123@item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
3124@item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
3125@end multitable
3126
3127@item @emph{Reference}:
3128@uref{https://www.openacc.org, OpenACC specification v2.6}, section
31293.2.3.
3130@end table
3131
3132
3133
3134@node acc_set_device_num
3135@section @code{acc_set_device_num} -- Set device number to use.
3136@table @asis
3137@item @emph{Description}
3138This function will indicate to the runtime which device number,
3139specified by @var{devicenum}, associated with the specified device
3140type @var{devicetype}.
3141
3142@item @emph{C/C++}:
3143@multitable @columnfractions .20 .80
3144@item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);}
3145@end multitable
3146
3147@item @emph{Fortran}:
3148@multitable @columnfractions .20 .80
3149@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
3150@item @tab @code{integer devicenum}
3151@item @tab @code{integer(kind=acc_device_kind) devicetype}
3152@end multitable
3153
3154@item @emph{Reference}:
3155@uref{https://www.openacc.org, OpenACC specification v2.6}, section
31563.2.4.
3157@end table
3158
3159
3160
3161@node acc_get_device_num
3162@section @code{acc_get_device_num} -- Get device number to be used.
3163@table @asis
3164@item @emph{Description}
3165This function returns which device number associated with the specified device
3166type @var{devicetype}, will be used when executing a parallel or kernels
3167region.
3168
3169@item @emph{C/C++}:
3170@multitable @columnfractions .20 .80
3171@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
3172@end multitable
3173
3174@item @emph{Fortran}:
3175@multitable @columnfractions .20 .80
3176@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
3177@item @tab @code{integer(kind=acc_device_kind) devicetype}
3178@item @tab @code{integer acc_get_device_num}
3179@end multitable
3180
3181@item @emph{Reference}:
3182@uref{https://www.openacc.org, OpenACC specification v2.6}, section
31833.2.5.
3184@end table
3185
3186
3187
3188@node acc_get_property
3189@section @code{acc_get_property} -- Get device property.
3190@cindex acc_get_property
3191@cindex acc_get_property_string
3192@table @asis
3193@item @emph{Description}
3194These routines return the value of the specified @var{property} for the
3195device being queried according to @var{devicenum} and @var{devicetype}.
3196Integer-valued and string-valued properties are returned by
3197@code{acc_get_property} and @code{acc_get_property_string} respectively.
3198The Fortran @code{acc_get_property_string} subroutine returns the string
3199retrieved in its fourth argument while the remaining entry points are
3200functions, which pass the return value as their result.
3201
3202Note for Fortran, only: the OpenACC technical committee corrected and, hence,
3203modified the interface introduced in OpenACC 2.6. The kind-value parameter
3204@code{acc_device_property} has been renamed to @code{acc_device_property_kind}
3205for consistency and the return type of the @code{acc_get_property} function is
3206now a @code{c_size_t} integer instead of a @code{acc_device_property} integer.
3207The parameter @code{acc_device_property} will continue to be provided,
3208but might be removed in a future version of GCC.
3209
3210@item @emph{C/C++}:
3211@multitable @columnfractions .20 .80
3212@item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
3213@item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
3214@end multitable
3215
3216@item @emph{Fortran}:
3217@multitable @columnfractions .20 .80
3218@item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)}
3219@item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)}
3220@item @tab @code{use ISO_C_Binding, only: c_size_t}
3221@item @tab @code{integer devicenum}
3222@item @tab @code{integer(kind=acc_device_kind) devicetype}
3223@item @tab @code{integer(kind=acc_device_property_kind) property}
3224@item @tab @code{integer(kind=c_size_t) acc_get_property}
3225@item @tab @code{character(*) string}
3226@end multitable
3227
3228@item @emph{Reference}:
3229@uref{https://www.openacc.org, OpenACC specification v2.6}, section
32303.2.6.
3231@end table
3232
3233
3234
3235@node acc_async_test
3236@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
3237@table @asis
3238@item @emph{Description}
3239This function tests for completion of the asynchronous operation specified
3240in @var{arg}. In C/C++, a non-zero value will be returned to indicate
3241the specified asynchronous operation has completed. While Fortran will return
3242a @code{true}. If the asynchronous operation has not completed, C/C++ returns
3243a zero and Fortran returns a @code{false}.
3244
3245@item @emph{C/C++}:
3246@multitable @columnfractions .20 .80
3247@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
3248@end multitable
3249
3250@item @emph{Fortran}:
3251@multitable @columnfractions .20 .80
3252@item @emph{Interface}: @tab @code{function acc_async_test(arg)}
3253@item @tab @code{integer(kind=acc_handle_kind) arg}
3254@item @tab @code{logical acc_async_test}
3255@end multitable
3256
3257@item @emph{Reference}:
3258@uref{https://www.openacc.org, OpenACC specification v2.6}, section
32593.2.9.
3260@end table
3261
3262
3263
3264@node acc_async_test_all
3265@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
3266@table @asis
3267@item @emph{Description}
3268This function tests for completion of all asynchronous operations.
3269In C/C++, a non-zero value will be returned to indicate all asynchronous
3270operations have completed. While Fortran will return a @code{true}. If
3271any asynchronous operation has not completed, C/C++ returns a zero and
3272Fortran returns a @code{false}.
3273
3274@item @emph{C/C++}:
3275@multitable @columnfractions .20 .80
3276@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
3277@end multitable
3278
3279@item @emph{Fortran}:
3280@multitable @columnfractions .20 .80
3281@item @emph{Interface}: @tab @code{function acc_async_test()}
3282@item @tab @code{logical acc_get_device_num}
3283@end multitable
3284
3285@item @emph{Reference}:
3286@uref{https://www.openacc.org, OpenACC specification v2.6}, section
32873.2.10.
3288@end table
3289
3290
3291
3292@node acc_wait
3293@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
3294@table @asis
3295@item @emph{Description}
3296This function waits for completion of the asynchronous operation
3297specified in @var{arg}.
3298
3299@item @emph{C/C++}:
3300@multitable @columnfractions .20 .80
3301@item @emph{Prototype}: @tab @code{acc_wait(arg);}
3302@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
3303@end multitable
3304
3305@item @emph{Fortran}:
3306@multitable @columnfractions .20 .80
3307@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
3308@item @tab @code{integer(acc_handle_kind) arg}
3309@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
3310@item @tab @code{integer(acc_handle_kind) arg}
3311@end multitable
3312
3313@item @emph{Reference}:
3314@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33153.2.11.
3316@end table
3317
3318
3319
3320@node acc_wait_all
3321@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
3322@table @asis
3323@item @emph{Description}
3324This function waits for the completion of all asynchronous operations.
3325
3326@item @emph{C/C++}:
3327@multitable @columnfractions .20 .80
3328@item @emph{Prototype}: @tab @code{acc_wait_all(void);}
3329@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
3330@end multitable
3331
3332@item @emph{Fortran}:
3333@multitable @columnfractions .20 .80
3334@item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
3335@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
3336@end multitable
3337
3338@item @emph{Reference}:
3339@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33403.2.13.
3341@end table
3342
3343
3344
3345@node acc_wait_all_async
3346@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
3347@table @asis
3348@item @emph{Description}
3349This function enqueues a wait operation on the queue @var{async} for any
3350and all asynchronous operations that have been previously enqueued on
3351any queue.
3352
3353@item @emph{C/C++}:
3354@multitable @columnfractions .20 .80
3355@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
3356@end multitable
3357
3358@item @emph{Fortran}:
3359@multitable @columnfractions .20 .80
3360@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
3361@item @tab @code{integer(acc_handle_kind) async}
3362@end multitable
3363
3364@item @emph{Reference}:
3365@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33663.2.14.
3367@end table
3368
3369
3370
3371@node acc_wait_async
3372@section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
3373@table @asis
3374@item @emph{Description}
3375This function enqueues a wait operation on queue @var{async} for any and all
3376asynchronous operations enqueued on queue @var{arg}.
3377
3378@item @emph{C/C++}:
3379@multitable @columnfractions .20 .80
3380@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
3381@end multitable
3382
3383@item @emph{Fortran}:
3384@multitable @columnfractions .20 .80
3385@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
3386@item @tab @code{integer(acc_handle_kind) arg, async}
3387@end multitable
3388
3389@item @emph{Reference}:
3390@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33913.2.12.
3392@end table
3393
3394
3395
3396@node acc_init
3397@section @code{acc_init} -- Initialize runtime for a specific device type.
3398@table @asis
3399@item @emph{Description}
3400This function initializes the runtime for the device type specified in
3401@var{devicetype}.
3402
3403@item @emph{C/C++}:
3404@multitable @columnfractions .20 .80
3405@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
3406@end multitable
3407
3408@item @emph{Fortran}:
3409@multitable @columnfractions .20 .80
3410@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
3411@item @tab @code{integer(acc_device_kind) devicetype}
3412@end multitable
3413
3414@item @emph{Reference}:
3415@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34163.2.7.
3417@end table
3418
3419
3420
3421@node acc_shutdown
3422@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
3423@table @asis
3424@item @emph{Description}
3425This function shuts down the runtime for the device type specified in
3426@var{devicetype}.
3427
3428@item @emph{C/C++}:
3429@multitable @columnfractions .20 .80
3430@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
3431@end multitable
3432
3433@item @emph{Fortran}:
3434@multitable @columnfractions .20 .80
3435@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
3436@item @tab @code{integer(acc_device_kind) devicetype}
3437@end multitable
3438
3439@item @emph{Reference}:
3440@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34413.2.8.
3442@end table
3443
3444
3445
3446@node acc_on_device
3447@section @code{acc_on_device} -- Whether executing on a particular device
3448@table @asis
3449@item @emph{Description}:
3450This function returns whether the program is executing on a particular
3451device specified in @var{devicetype}. In C/C++ a non-zero value is
3452returned to indicate the device is executing on the specified device type.
3453In Fortran, @code{true} will be returned. If the program is not executing
3454on the specified device type C/C++ will return a zero, while Fortran will
3455return @code{false}.
3456
3457@item @emph{C/C++}:
3458@multitable @columnfractions .20 .80
3459@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
3460@end multitable
3461
3462@item @emph{Fortran}:
3463@multitable @columnfractions .20 .80
3464@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
3465@item @tab @code{integer(acc_device_kind) devicetype}
3466@item @tab @code{logical acc_on_device}
3467@end multitable
3468
3469
3470@item @emph{Reference}:
3471@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34723.2.17.
3473@end table
3474
3475
3476
3477@node acc_malloc
3478@section @code{acc_malloc} -- Allocate device memory.
3479@table @asis
3480@item @emph{Description}
3481This function allocates @var{len} bytes of device memory. It returns
3482the device address of the allocated memory.
3483
3484@item @emph{C/C++}:
3485@multitable @columnfractions .20 .80
3486@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
3487@end multitable
3488
3489@item @emph{Reference}:
3490@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34913.2.18.
3492@end table
3493
3494
3495
3496@node acc_free
3497@section @code{acc_free} -- Free device memory.
3498@table @asis
3499@item @emph{Description}
3500Free previously allocated device memory at the device address @code{a}.
3501
3502@item @emph{C/C++}:
3503@multitable @columnfractions .20 .80
3504@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
3505@end multitable
3506
3507@item @emph{Reference}:
3508@uref{https://www.openacc.org, OpenACC specification v2.6}, section
35093.2.19.
3510@end table
3511
3512
3513
3514@node acc_copyin
3515@section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
3516@table @asis
3517@item @emph{Description}
3518In C/C++, this function allocates @var{len} bytes of device memory
3519and maps it to the specified host address in @var{a}. The device
3520address of the newly allocated device memory is returned.
3521
3522In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3523a contiguous array section. The second form @var{a} specifies a
3524variable or array element and @var{len} specifies the length in bytes.
3525
3526@item @emph{C/C++}:
3527@multitable @columnfractions .20 .80
3528@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
3529@item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);}
3530@end multitable
3531
3532@item @emph{Fortran}:
3533@multitable @columnfractions .20 .80
3534@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
3535@item @tab @code{type, dimension(:[,:]...) :: a}
3536@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
3537@item @tab @code{type, dimension(:[,:]...) :: a}
3538@item @tab @code{integer len}
3539@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)}
3540@item @tab @code{type, dimension(:[,:]...) :: a}
3541@item @tab @code{integer(acc_handle_kind) :: async}
3542@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)}
3543@item @tab @code{type, dimension(:[,:]...) :: a}
3544@item @tab @code{integer len}
3545@item @tab @code{integer(acc_handle_kind) :: async}
3546@end multitable
3547
3548@item @emph{Reference}:
3549@uref{https://www.openacc.org, OpenACC specification v2.6}, section
35503.2.20.
3551@end table
3552
3553
3554
3555@node acc_present_or_copyin
3556@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
3557@table @asis
3558@item @emph{Description}
3559This function tests if the host data specified by @var{a} and of length
3560@var{len} is present or not. If it is not present, then device memory
3561will be allocated and the host memory copied. The device address of
3562the newly allocated device memory is returned.
3563
3564In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3565a contiguous array section. The second form @var{a} specifies a variable or
3566array element and @var{len} specifies the length in bytes.
3567
3568Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for
3569backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead.
3570
3571@item @emph{C/C++}:
3572@multitable @columnfractions .20 .80
3573@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
3574@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
3575@end multitable
3576
3577@item @emph{Fortran}:
3578@multitable @columnfractions .20 .80
3579@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
3580@item @tab @code{type, dimension(:[,:]...) :: a}
3581@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
3582@item @tab @code{type, dimension(:[,:]...) :: a}
3583@item @tab @code{integer len}
3584@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
3585@item @tab @code{type, dimension(:[,:]...) :: a}
3586@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
3587@item @tab @code{type, dimension(:[,:]...) :: a}
3588@item @tab @code{integer len}
3589@end multitable
3590
3591@item @emph{Reference}:
3592@uref{https://www.openacc.org, OpenACC specification v2.6}, section
35933.2.20.
3594@end table
3595
3596
3597
3598@node acc_create
3599@section @code{acc_create} -- Allocate device memory and map it to host memory.
3600@table @asis
3601@item @emph{Description}
3602This function allocates device memory and maps it to host memory specified
3603by the host address @var{a} with a length of @var{len} bytes. In C/C++,
3604the function returns the device address of the allocated device memory.
3605
3606In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3607a contiguous array section. The second form @var{a} specifies a variable or
3608array element and @var{len} specifies the length in bytes.
3609
3610@item @emph{C/C++}:
3611@multitable @columnfractions .20 .80
3612@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
3613@item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);}
3614@end multitable
3615
3616@item @emph{Fortran}:
3617@multitable @columnfractions .20 .80
3618@item @emph{Interface}: @tab @code{subroutine acc_create(a)}
3619@item @tab @code{type, dimension(:[,:]...) :: a}
3620@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
3621@item @tab @code{type, dimension(:[,:]...) :: a}
3622@item @tab @code{integer len}
3623@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)}
3624@item @tab @code{type, dimension(:[,:]...) :: a}
3625@item @tab @code{integer(acc_handle_kind) :: async}
3626@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)}
3627@item @tab @code{type, dimension(:[,:]...) :: a}
3628@item @tab @code{integer len}
3629@item @tab @code{integer(acc_handle_kind) :: async}
3630@end multitable
3631
3632@item @emph{Reference}:
3633@uref{https://www.openacc.org, OpenACC specification v2.6}, section
36343.2.21.
3635@end table
3636
3637
3638
3639@node acc_present_or_create
3640@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
3641@table @asis
3642@item @emph{Description}
3643This function tests if the host data specified by @var{a} and of length
3644@var{len} is present or not. If it is not present, then device memory
3645will be allocated and mapped to host memory. In C/C++, the device address
3646of the newly allocated device memory is returned.
3647
3648In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3649a contiguous array section. The second form @var{a} specifies a variable or
3650array element and @var{len} specifies the length in bytes.
3651
3652Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for
3653backward compatibility with OpenACC 2.0; use @ref{acc_create} instead.
3654
3655@item @emph{C/C++}:
3656@multitable @columnfractions .20 .80
3657@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
3658@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
3659@end multitable
3660
3661@item @emph{Fortran}:
3662@multitable @columnfractions .20 .80
3663@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
3664@item @tab @code{type, dimension(:[,:]...) :: a}
3665@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
3666@item @tab @code{type, dimension(:[,:]...) :: a}
3667@item @tab @code{integer len}
3668@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
3669@item @tab @code{type, dimension(:[,:]...) :: a}
3670@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
3671@item @tab @code{type, dimension(:[,:]...) :: a}
3672@item @tab @code{integer len}
3673@end multitable
3674
3675@item @emph{Reference}:
3676@uref{https://www.openacc.org, OpenACC specification v2.6}, section
36773.2.21.
3678@end table
3679
3680
3681
3682@node acc_copyout
3683@section @code{acc_copyout} -- Copy device memory to host memory.
3684@table @asis
3685@item @emph{Description}
3686This function copies mapped device memory to host memory which is specified
3687by host address @var{a} for a length @var{len} bytes in C/C++.
3688
3689In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3690a contiguous array section. The second form @var{a} specifies a variable or
3691array element and @var{len} specifies the length in bytes.
3692
3693@item @emph{C/C++}:
3694@multitable @columnfractions .20 .80
3695@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
3696@item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);}
3697@item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);}
3698@item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);}
3699@end multitable
3700
3701@item @emph{Fortran}:
3702@multitable @columnfractions .20 .80
3703@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
3704@item @tab @code{type, dimension(:[,:]...) :: a}
3705@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
3706@item @tab @code{type, dimension(:[,:]...) :: a}
3707@item @tab @code{integer len}
3708@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)}
3709@item @tab @code{type, dimension(:[,:]...) :: a}
3710@item @tab @code{integer(acc_handle_kind) :: async}
3711@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)}
3712@item @tab @code{type, dimension(:[,:]...) :: a}
3713@item @tab @code{integer len}
3714@item @tab @code{integer(acc_handle_kind) :: async}
3715@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)}
3716@item @tab @code{type, dimension(:[,:]...) :: a}
3717@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)}
3718@item @tab @code{type, dimension(:[,:]...) :: a}
3719@item @tab @code{integer len}
3720@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)}
3721@item @tab @code{type, dimension(:[,:]...) :: a}
3722@item @tab @code{integer(acc_handle_kind) :: async}
3723@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)}
3724@item @tab @code{type, dimension(:[,:]...) :: a}
3725@item @tab @code{integer len}
3726@item @tab @code{integer(acc_handle_kind) :: async}
3727@end multitable
3728
3729@item @emph{Reference}:
3730@uref{https://www.openacc.org, OpenACC specification v2.6}, section
37313.2.22.
3732@end table
3733
3734
3735
3736@node acc_delete
3737@section @code{acc_delete} -- Free device memory.
3738@table @asis
3739@item @emph{Description}
3740This function frees previously allocated device memory specified by
3741the device address @var{a} and the length of @var{len} bytes.
3742
3743In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3744a contiguous array section. The second form @var{a} specifies a variable or
3745array element and @var{len} specifies the length in bytes.
3746
3747@item @emph{C/C++}:
3748@multitable @columnfractions .20 .80
3749@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
3750@item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);}
3751@item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);}
3752@item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);}
3753@end multitable
3754
3755@item @emph{Fortran}:
3756@multitable @columnfractions .20 .80
3757@item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
3758@item @tab @code{type, dimension(:[,:]...) :: a}
3759@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
3760@item @tab @code{type, dimension(:[,:]...) :: a}
3761@item @tab @code{integer len}
3762@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)}
3763@item @tab @code{type, dimension(:[,:]...) :: a}
3764@item @tab @code{integer(acc_handle_kind) :: async}
3765@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)}
3766@item @tab @code{type, dimension(:[,:]...) :: a}
3767@item @tab @code{integer len}
3768@item @tab @code{integer(acc_handle_kind) :: async}
3769@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)}
3770@item @tab @code{type, dimension(:[,:]...) :: a}
3771@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)}
3772@item @tab @code{type, dimension(:[,:]...) :: a}
3773@item @tab @code{integer len}
3774@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)}
3775@item @tab @code{type, dimension(:[,:]...) :: a}
3776@item @tab @code{integer(acc_handle_kind) :: async}
3777@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)}
3778@item @tab @code{type, dimension(:[,:]...) :: a}
3779@item @tab @code{integer len}
3780@item @tab @code{integer(acc_handle_kind) :: async}
3781@end multitable
3782
3783@item @emph{Reference}:
3784@uref{https://www.openacc.org, OpenACC specification v2.6}, section
37853.2.23.
3786@end table
3787
3788
3789
3790@node acc_update_device
3791@section @code{acc_update_device} -- Update device memory from mapped host memory.
3792@table @asis
3793@item @emph{Description}
3794This function updates the device copy from the previously mapped host memory.
3795The host memory is specified with the host address @var{a} and a length of
3796@var{len} bytes.
3797
3798In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3799a contiguous array section. The second form @var{a} specifies a variable or
3800array element and @var{len} specifies the length in bytes.
3801
3802@item @emph{C/C++}:
3803@multitable @columnfractions .20 .80
3804@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
3805@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);}
3806@end multitable
3807
3808@item @emph{Fortran}:
3809@multitable @columnfractions .20 .80
3810@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
3811@item @tab @code{type, dimension(:[,:]...) :: a}
3812@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
3813@item @tab @code{type, dimension(:[,:]...) :: a}
3814@item @tab @code{integer len}
3815@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)}
3816@item @tab @code{type, dimension(:[,:]...) :: a}
3817@item @tab @code{integer(acc_handle_kind) :: async}
3818@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)}
3819@item @tab @code{type, dimension(:[,:]...) :: a}
3820@item @tab @code{integer len}
3821@item @tab @code{integer(acc_handle_kind) :: async}
3822@end multitable
3823
3824@item @emph{Reference}:
3825@uref{https://www.openacc.org, OpenACC specification v2.6}, section
38263.2.24.
3827@end table
3828
3829
3830
3831@node acc_update_self
3832@section @code{acc_update_self} -- Update host memory from mapped device memory.
3833@table @asis
3834@item @emph{Description}
3835This function updates the host copy from the previously mapped device memory.
3836The host memory is specified with the host address @var{a} and a length of
3837@var{len} bytes.
3838
3839In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3840a contiguous array section. The second form @var{a} specifies a variable or
3841array element and @var{len} specifies the length in bytes.
3842
3843@item @emph{C/C++}:
3844@multitable @columnfractions .20 .80
3845@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
3846@item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);}
3847@end multitable
3848
3849@item @emph{Fortran}:
3850@multitable @columnfractions .20 .80
3851@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
3852@item @tab @code{type, dimension(:[,:]...) :: a}
3853@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
3854@item @tab @code{type, dimension(:[,:]...) :: a}
3855@item @tab @code{integer len}
3856@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)}
3857@item @tab @code{type, dimension(:[,:]...) :: a}
3858@item @tab @code{integer(acc_handle_kind) :: async}
3859@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)}
3860@item @tab @code{type, dimension(:[,:]...) :: a}
3861@item @tab @code{integer len}
3862@item @tab @code{integer(acc_handle_kind) :: async}
3863@end multitable
3864
3865@item @emph{Reference}:
3866@uref{https://www.openacc.org, OpenACC specification v2.6}, section
38673.2.25.
3868@end table
3869
3870
3871
3872@node acc_map_data
3873@section @code{acc_map_data} -- Map previously allocated device memory to host memory.
3874@table @asis
3875@item @emph{Description}
3876This function maps previously allocated device and host memory. The device
3877memory is specified with the device address @var{d}. The host memory is
3878specified with the host address @var{h} and a length of @var{len}.
3879
3880@item @emph{C/C++}:
3881@multitable @columnfractions .20 .80
3882@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
3883@end multitable
3884
3885@item @emph{Reference}:
3886@uref{https://www.openacc.org, OpenACC specification v2.6}, section
38873.2.26.
3888@end table
3889
3890
3891
3892@node acc_unmap_data
3893@section @code{acc_unmap_data} -- Unmap device memory from host memory.
3894@table @asis
3895@item @emph{Description}
3896This function unmaps previously mapped device and host memory. The latter
3897specified by @var{h}.
3898
3899@item @emph{C/C++}:
3900@multitable @columnfractions .20 .80
3901@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
3902@end multitable
3903
3904@item @emph{Reference}:
3905@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39063.2.27.
3907@end table
3908
3909
3910
3911@node acc_deviceptr
3912@section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
3913@table @asis
3914@item @emph{Description}
3915This function returns the device address that has been mapped to the
3916host address specified by @var{h}.
3917
3918@item @emph{C/C++}:
3919@multitable @columnfractions .20 .80
3920@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
3921@end multitable
3922
3923@item @emph{Reference}:
3924@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39253.2.28.
3926@end table
3927
3928
3929
3930@node acc_hostptr
3931@section @code{acc_hostptr} -- Get host pointer associated with specific device address.
3932@table @asis
3933@item @emph{Description}
3934This function returns the host address that has been mapped to the
3935device address specified by @var{d}.
3936
3937@item @emph{C/C++}:
3938@multitable @columnfractions .20 .80
3939@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
3940@end multitable
3941
3942@item @emph{Reference}:
3943@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39443.2.29.
3945@end table
3946
3947
3948
3949@node acc_is_present
3950@section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
3951@table @asis
3952@item @emph{Description}
3953This function indicates whether the specified host address in @var{a} and a
3954length of @var{len} bytes is present on the device. In C/C++, a non-zero
3955value is returned to indicate the presence of the mapped memory on the
3956device. A zero is returned to indicate the memory is not mapped on the
3957device.
3958
3959In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3960a contiguous array section. The second form @var{a} specifies a variable or
3961array element and @var{len} specifies the length in bytes. If the host
3962memory is mapped to device memory, then a @code{true} is returned. Otherwise,
3963a @code{false} is return to indicate the mapped memory is not present.
3964
3965@item @emph{C/C++}:
3966@multitable @columnfractions .20 .80
3967@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
3968@end multitable
3969
3970@item @emph{Fortran}:
3971@multitable @columnfractions .20 .80
3972@item @emph{Interface}: @tab @code{function acc_is_present(a)}
3973@item @tab @code{type, dimension(:[,:]...) :: a}
3974@item @tab @code{logical acc_is_present}
3975@item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
3976@item @tab @code{type, dimension(:[,:]...) :: a}
3977@item @tab @code{integer len}
3978@item @tab @code{logical acc_is_present}
3979@end multitable
3980
3981@item @emph{Reference}:
3982@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39833.2.30.
3984@end table
3985
3986
3987
3988@node acc_memcpy_to_device
3989@section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
3990@table @asis
3991@item @emph{Description}
3992This function copies host memory specified by host address of @var{src} to
3993device memory specified by the device address @var{dest} for a length of
3994@var{bytes} bytes.
3995
3996@item @emph{C/C++}:
3997@multitable @columnfractions .20 .80
3998@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
3999@end multitable
4000
4001@item @emph{Reference}:
4002@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40033.2.31.
4004@end table
4005
4006
4007
4008@node acc_memcpy_from_device
4009@section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
4010@table @asis
4011@item @emph{Description}
4012This function copies host memory specified by host address of @var{src} from
4013device memory specified by the device address @var{dest} for a length of
4014@var{bytes} bytes.
4015
4016@item @emph{C/C++}:
4017@multitable @columnfractions .20 .80
4018@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
4019@end multitable
4020
4021@item @emph{Reference}:
4022@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40233.2.32.
4024@end table
4025
4026
4027
4028@node acc_attach
4029@section @code{acc_attach} -- Let device pointer point to device-pointer target.
4030@table @asis
4031@item @emph{Description}
4032This function updates a pointer on the device from pointing to a host-pointer
4033address to pointing to the corresponding device data.
4034
4035@item @emph{C/C++}:
4036@multitable @columnfractions .20 .80
4037@item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);}
4038@item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);}
4039@end multitable
4040
4041@item @emph{Reference}:
4042@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40433.2.34.
4044@end table
4045
4046
4047
4048@node acc_detach
4049@section @code{acc_detach} -- Let device pointer point to host-pointer target.
4050@table @asis
4051@item @emph{Description}
4052This function updates a pointer on the device from pointing to a device-pointer
4053address to pointing to the corresponding host data.
4054
4055@item @emph{C/C++}:
4056@multitable @columnfractions .20 .80
4057@item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);}
4058@item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);}
4059@item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);}
4060@item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);}
4061@end multitable
4062
4063@item @emph{Reference}:
4064@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40653.2.35.
4066@end table
4067
4068
4069
4070@node acc_get_current_cuda_device
4071@section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
4072@table @asis
4073@item @emph{Description}
4074This function returns the CUDA device handle. This handle is the same
4075as used by the CUDA Runtime or Driver API's.
4076
4077@item @emph{C/C++}:
4078@multitable @columnfractions .20 .80
4079@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
4080@end multitable
4081
4082@item @emph{Reference}:
4083@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4084A.2.1.1.
4085@end table
4086
4087
4088
4089@node acc_get_current_cuda_context
4090@section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
4091@table @asis
4092@item @emph{Description}
4093This function returns the CUDA context handle. This handle is the same
4094as used by the CUDA Runtime or Driver API's.
4095
4096@item @emph{C/C++}:
4097@multitable @columnfractions .20 .80
4098@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
4099@end multitable
4100
4101@item @emph{Reference}:
4102@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4103A.2.1.2.
4104@end table
4105
4106
4107
4108@node acc_get_cuda_stream
4109@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
4110@table @asis
4111@item @emph{Description}
4112This function returns the CUDA stream handle for the queue @var{async}.
4113This handle is the same as used by the CUDA Runtime or Driver API's.
4114
4115@item @emph{C/C++}:
4116@multitable @columnfractions .20 .80
4117@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
4118@end multitable
4119
4120@item @emph{Reference}:
4121@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4122A.2.1.3.
4123@end table
4124
4125
4126
4127@node acc_set_cuda_stream
4128@section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
4129@table @asis
4130@item @emph{Description}
4131This function associates the stream handle specified by @var{stream} with
4132the queue @var{async}.
4133
4134This cannot be used to change the stream handle associated with
4135@code{acc_async_sync}.
4136
4137The return value is not specified.
4138
4139@item @emph{C/C++}:
4140@multitable @columnfractions .20 .80
4141@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
4142@end multitable
4143
4144@item @emph{Reference}:
4145@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4146A.2.1.4.
4147@end table
4148
4149
4150
4151@node acc_prof_register
4152@section @code{acc_prof_register} -- Register callbacks.
4153@table @asis
4154@item @emph{Description}:
4155This function registers callbacks.
4156
4157@item @emph{C/C++}:
4158@multitable @columnfractions .20 .80
4159@item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
4160@end multitable
4161
4162@item @emph{See also}:
4163@ref{OpenACC Profiling Interface}
4164
4165@item @emph{Reference}:
4166@uref{https://www.openacc.org, OpenACC specification v2.6}, section
41675.3.
4168@end table
4169
4170
4171
4172@node acc_prof_unregister
4173@section @code{acc_prof_unregister} -- Unregister callbacks.
4174@table @asis
4175@item @emph{Description}:
4176This function unregisters callbacks.
4177
4178@item @emph{C/C++}:
4179@multitable @columnfractions .20 .80
4180@item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
4181@end multitable
4182
4183@item @emph{See also}:
4184@ref{OpenACC Profiling Interface}
4185
4186@item @emph{Reference}:
4187@uref{https://www.openacc.org, OpenACC specification v2.6}, section
41885.3.
4189@end table
4190
4191
4192
4193@node acc_prof_lookup
4194@section @code{acc_prof_lookup} -- Obtain inquiry functions.
4195@table @asis
4196@item @emph{Description}:
4197Function to obtain inquiry functions.
4198
4199@item @emph{C/C++}:
4200@multitable @columnfractions .20 .80
4201@item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
4202@end multitable
4203
4204@item @emph{See also}:
4205@ref{OpenACC Profiling Interface}
4206
4207@item @emph{Reference}:
4208@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42095.3.
4210@end table
4211
4212
4213
4214@node acc_register_library
4215@section @code{acc_register_library} -- Library registration.
4216@table @asis
4217@item @emph{Description}:
4218Function for library registration.
4219
4220@item @emph{C/C++}:
4221@multitable @columnfractions .20 .80
4222@item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
4223@end multitable
4224
4225@item @emph{See also}:
4226@ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
4227
4228@item @emph{Reference}:
4229@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42305.3.
4231@end table
4232
4233
4234
4235@c ---------------------------------------------------------------------
4236@c OpenACC Environment Variables
4237@c ---------------------------------------------------------------------
4238
4239@node OpenACC Environment Variables
4240@chapter OpenACC Environment Variables
4241
4242The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
4243are defined by section 4 of the OpenACC specification in version 2.0.
4244The variable @env{ACC_PROFLIB}
4245is defined by section 4 of the OpenACC specification in version 2.6.
4246The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
4247
4248@menu
4249* ACC_DEVICE_TYPE::
4250* ACC_DEVICE_NUM::
4251* ACC_PROFLIB::
4252* GCC_ACC_NOTIFY::
4253@end menu
4254
4255
4256
4257@node ACC_DEVICE_TYPE
4258@section @code{ACC_DEVICE_TYPE}
4259@table @asis
4260@item @emph{Reference}:
4261@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42624.1.
4263@end table
4264
4265
4266
4267@node ACC_DEVICE_NUM
4268@section @code{ACC_DEVICE_NUM}
4269@table @asis
4270@item @emph{Reference}:
4271@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42724.2.
4273@end table
4274
4275
4276
4277@node ACC_PROFLIB
4278@section @code{ACC_PROFLIB}
4279@table @asis
4280@item @emph{See also}:
4281@ref{acc_register_library}, @ref{OpenACC Profiling Interface}
4282
4283@item @emph{Reference}:
4284@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42854.3.
4286@end table
4287
4288
4289
4290@node GCC_ACC_NOTIFY
4291@section @code{GCC_ACC_NOTIFY}
4292@table @asis
4293@item @emph{Description}:
4294Print debug information pertaining to the accelerator.
4295@end table
4296
4297
4298
4299@c ---------------------------------------------------------------------
4300@c CUDA Streams Usage
4301@c ---------------------------------------------------------------------
4302
4303@node CUDA Streams Usage
4304@chapter CUDA Streams Usage
4305
4306This applies to the @code{nvptx} plugin only.
4307
4308The library provides elements that perform asynchronous movement of
4309data and asynchronous operation of computing constructs. This
4310asynchronous functionality is implemented by making use of CUDA
4311streams@footnote{See "Stream Management" in "CUDA Driver API",
4312TRM-06703-001, Version 5.5, for additional information}.
4313
4314The primary means by that the asynchronous functionality is accessed
4315is through the use of those OpenACC directives which make use of the
4316@code{async} and @code{wait} clauses. When the @code{async} clause is
4317first used with a directive, it creates a CUDA stream. If an
4318@code{async-argument} is used with the @code{async} clause, then the
4319stream is associated with the specified @code{async-argument}.
4320
4321Following the creation of an association between a CUDA stream and the
4322@code{async-argument} of an @code{async} clause, both the @code{wait}
4323clause and the @code{wait} directive can be used. When either the
4324clause or directive is used after stream creation, it creates a
4325rendezvous point whereby execution waits until all operations
4326associated with the @code{async-argument}, that is, stream, have
4327completed.
4328
4329Normally, the management of the streams that are created as a result of
4330using the @code{async} clause, is done without any intervention by the
4331caller. This implies the association between the @code{async-argument}
4332and the CUDA stream will be maintained for the lifetime of the program.
4333However, this association can be changed through the use of the library
4334function @code{acc_set_cuda_stream}. When the function
4335@code{acc_set_cuda_stream} is called, the CUDA stream that was
4336originally associated with the @code{async} clause will be destroyed.
4337Caution should be taken when changing the association as subsequent
4338references to the @code{async-argument} refer to a different
4339CUDA stream.
4340
4341
4342
4343@c ---------------------------------------------------------------------
4344@c OpenACC Library Interoperability
4345@c ---------------------------------------------------------------------
4346
4347@node OpenACC Library Interoperability
4348@chapter OpenACC Library Interoperability
4349
4350@section Introduction
4351
4352The OpenACC library uses the CUDA Driver API, and may interact with
4353programs that use the Runtime library directly, or another library
4354based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
4355"Interactions with the CUDA Driver API" in
4356"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
4357Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
4358for additional information on library interoperability.}.
4359This chapter describes the use cases and what changes are
4360required in order to use both the OpenACC library and the CUBLAS and Runtime
4361libraries within a program.
4362
4363@section First invocation: NVIDIA CUBLAS library API
4364
4365In this first use case (see below), a function in the CUBLAS library is called
4366prior to any of the functions in the OpenACC library. More specifically, the
4367function @code{cublasCreate()}.
4368
4369When invoked, the function initializes the library and allocates the
4370hardware resources on the host and the device on behalf of the caller. Once
4371the initialization and allocation has completed, a handle is returned to the
4372caller. The OpenACC library also requires initialization and allocation of
4373hardware resources. Since the CUBLAS library has already allocated the
4374hardware resources for the device, all that is left to do is to initialize
4375the OpenACC library and acquire the hardware resources on the host.
4376
4377Prior to calling the OpenACC function that initializes the library and
4378allocate the host hardware resources, you need to acquire the device number
4379that was allocated during the call to @code{cublasCreate()}. The invoking of the
4380runtime library function @code{cudaGetDevice()} accomplishes this. Once
4381acquired, the device number is passed along with the device type as
4382parameters to the OpenACC library function @code{acc_set_device_num()}.
4383
4384Once the call to @code{acc_set_device_num()} has completed, the OpenACC
4385library uses the context that was created during the call to
4386@code{cublasCreate()}. In other words, both libraries will be sharing the
4387same context.
4388
4389@smallexample
4390 /* Create the handle */
4391 s = cublasCreate(&h);
4392 if (s != CUBLAS_STATUS_SUCCESS)
4393 @{
4394 fprintf(stderr, "cublasCreate failed %d\n", s);
4395 exit(EXIT_FAILURE);
4396 @}
4397
4398 /* Get the device number */
4399 e = cudaGetDevice(&dev);
4400 if (e != cudaSuccess)
4401 @{
4402 fprintf(stderr, "cudaGetDevice failed %d\n", e);
4403 exit(EXIT_FAILURE);
4404 @}
4405
4406 /* Initialize OpenACC library and use device 'dev' */
4407 acc_set_device_num(dev, acc_device_nvidia);
4408
4409@end smallexample
4410@center Use Case 1
4411
4412@section First invocation: OpenACC library API
4413
4414In this second use case (see below), a function in the OpenACC library is
eda38850 4415called prior to any of the functions in the CUBLAS library. More specifically,
d77de738
ML
4416the function @code{acc_set_device_num()}.
4417
4418In the use case presented here, the function @code{acc_set_device_num()}
4419is used to both initialize the OpenACC library and allocate the hardware
4420resources on the host and the device. In the call to the function, the
4421call parameters specify which device to use and what device
4422type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
4423is but one method to initialize the OpenACC library and allocate the
4424appropriate hardware resources. Other methods are available through the
4425use of environment variables and these will be discussed in the next section.
4426
4427Once the call to @code{acc_set_device_num()} has completed, other OpenACC
4428functions can be called as seen with multiple calls being made to
4429@code{acc_copyin()}. In addition, calls can be made to functions in the
4430CUBLAS library. In the use case a call to @code{cublasCreate()} is made
4431subsequent to the calls to @code{acc_copyin()}.
4432As seen in the previous use case, a call to @code{cublasCreate()}
4433initializes the CUBLAS library and allocates the hardware resources on the
4434host and the device. However, since the device has already been allocated,
4435@code{cublasCreate()} will only initialize the CUBLAS library and allocate
4436the appropriate hardware resources on the host. The context that was created
4437as part of the OpenACC initialization is shared with the CUBLAS library,
4438similarly to the first use case.
4439
4440@smallexample
4441 dev = 0;
4442
4443 acc_set_device_num(dev, acc_device_nvidia);
4444
4445 /* Copy the first set to the device */
4446 d_X = acc_copyin(&h_X[0], N * sizeof (float));
4447 if (d_X == NULL)
4448 @{
4449 fprintf(stderr, "copyin error h_X\n");
4450 exit(EXIT_FAILURE);
4451 @}
4452
4453 /* Copy the second set to the device */
4454 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
4455 if (d_Y == NULL)
4456 @{
4457 fprintf(stderr, "copyin error h_Y1\n");
4458 exit(EXIT_FAILURE);
4459 @}
4460
4461 /* Create the handle */
4462 s = cublasCreate(&h);
4463 if (s != CUBLAS_STATUS_SUCCESS)
4464 @{
4465 fprintf(stderr, "cublasCreate failed %d\n", s);
4466 exit(EXIT_FAILURE);
4467 @}
4468
4469 /* Perform saxpy using CUBLAS library function */
4470 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
4471 if (s != CUBLAS_STATUS_SUCCESS)
4472 @{
4473 fprintf(stderr, "cublasSaxpy failed %d\n", s);
4474 exit(EXIT_FAILURE);
4475 @}
4476
4477 /* Copy the results from the device */
4478 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
4479
4480@end smallexample
4481@center Use Case 2
4482
4483@section OpenACC library and environment variables
4484
4485There are two environment variables associated with the OpenACC library
4486that may be used to control the device type and device number:
4487@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two
4488environment variables can be used as an alternative to calling
4489@code{acc_set_device_num()}. As seen in the second use case, the device
4490type and device number were specified using @code{acc_set_device_num()}.
4491If however, the aforementioned environment variables were set, then the
4492call to @code{acc_set_device_num()} would not be required.
4493
4494
4495The use of the environment variables is only relevant when an OpenACC function
4496is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
4497is called prior to a call to an OpenACC function, then you must call
4498@code{acc_set_device_num()}@footnote{More complete information
4499about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
4500sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
4501Application Programming Interface”, Version 2.6.}
4502
4503
4504
4505@c ---------------------------------------------------------------------
4506@c OpenACC Profiling Interface
4507@c ---------------------------------------------------------------------
4508
4509@node OpenACC Profiling Interface
4510@chapter OpenACC Profiling Interface
4511
4512@section Implementation Status and Implementation-Defined Behavior
4513
4514We're implementing the OpenACC Profiling Interface as defined by the
4515OpenACC 2.6 specification. We're clarifying some aspects here as
4516@emph{implementation-defined behavior}, while they're still under
4517discussion within the OpenACC Technical Committee.
4518
4519This implementation is tuned to keep the performance impact as low as
4520possible for the (very common) case that the Profiling Interface is
4521not enabled. This is relevant, as the Profiling Interface affects all
4522the @emph{hot} code paths (in the target code, not in the offloaded
4523code). Users of the OpenACC Profiling Interface can be expected to
4524understand that performance will be impacted to some degree once the
4525Profiling Interface has gotten enabled: for example, because of the
4526@emph{runtime} (libgomp) calling into a third-party @emph{library} for
4527every event that has been registered.
4528
4529We're not yet accounting for the fact that @cite{OpenACC events may
4530occur during event processing}.
4531We just handle one case specially, as required by CUDA 9.0
4532@command{nvprof}, that @code{acc_get_device_type}
4533(@ref{acc_get_device_type})) may be called from
4534@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
4535callbacks.
4536
4537We're not yet implementing initialization via a
4538@code{acc_register_library} function that is either statically linked
4539in, or dynamically via @env{LD_PRELOAD}.
4540Initialization via @code{acc_register_library} functions dynamically
4541loaded via the @env{ACC_PROFLIB} environment variable does work, as
4542does directly calling @code{acc_prof_register},
4543@code{acc_prof_unregister}, @code{acc_prof_lookup}.
4544
4545As currently there are no inquiry functions defined, calls to
4546@code{acc_prof_lookup} will always return @code{NULL}.
4547
4548There aren't separate @emph{start}, @emph{stop} events defined for the
4549event types @code{acc_ev_create}, @code{acc_ev_delete},
4550@code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these
4551should be triggered before or after the actual device-specific call is
4552made. We trigger them after.
4553
4554Remarks about data provided to callbacks:
4555
4556@table @asis
4557
4558@item @code{acc_prof_info.event_type}
4559It's not clear if for @emph{nested} event callbacks (for example,
4560@code{acc_ev_enqueue_launch_start} as part of a parent compute
4561construct), this should be set for the nested event
4562(@code{acc_ev_enqueue_launch_start}), or if the value of the parent
4563construct should remain (@code{acc_ev_compute_construct_start}). In
4564this implementation, the value will generally correspond to the
4565innermost nested event type.
4566
4567@item @code{acc_prof_info.device_type}
4568@itemize
4569
4570@item
4571For @code{acc_ev_compute_construct_start}, and in presence of an
4572@code{if} clause with @emph{false} argument, this will still refer to
4573the offloading device type.
4574It's not clear if that's the expected behavior.
4575
4576@item
4577Complementary to the item before, for
4578@code{acc_ev_compute_construct_end}, this is set to
4579@code{acc_device_host} in presence of an @code{if} clause with
4580@emph{false} argument.
4581It's not clear if that's the expected behavior.
4582
4583@end itemize
4584
4585@item @code{acc_prof_info.thread_id}
4586Always @code{-1}; not yet implemented.
4587
4588@item @code{acc_prof_info.async}
4589@itemize
4590
4591@item
4592Not yet implemented correctly for
4593@code{acc_ev_compute_construct_start}.
4594
4595@item
4596In a compute construct, for host-fallback
4597execution/@code{acc_device_host} it will always be
4598@code{acc_async_sync}.
4599It's not clear if that's the expected behavior.
4600
4601@item
4602For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
4603it will always be @code{acc_async_sync}.
4604It's not clear if that's the expected behavior.
4605
4606@end itemize
4607
4608@item @code{acc_prof_info.async_queue}
4609There is no @cite{limited number of asynchronous queues} in libgomp.
4610This will always have the same value as @code{acc_prof_info.async}.
4611
4612@item @code{acc_prof_info.src_file}
4613Always @code{NULL}; not yet implemented.
4614
4615@item @code{acc_prof_info.func_name}
4616Always @code{NULL}; not yet implemented.
4617
4618@item @code{acc_prof_info.line_no}
4619Always @code{-1}; not yet implemented.
4620
4621@item @code{acc_prof_info.end_line_no}
4622Always @code{-1}; not yet implemented.
4623
4624@item @code{acc_prof_info.func_line_no}
4625Always @code{-1}; not yet implemented.
4626
4627@item @code{acc_prof_info.func_end_line_no}
4628Always @code{-1}; not yet implemented.
4629
4630@item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
4631Relating to @code{acc_prof_info.event_type} discussed above, in this
4632implementation, this will always be the same value as
4633@code{acc_prof_info.event_type}.
4634
4635@item @code{acc_event_info.*.parent_construct}
4636@itemize
4637
4638@item
4639Will be @code{acc_construct_parallel} for all OpenACC compute
4640constructs as well as many OpenACC Runtime API calls; should be the
4641one matching the actual construct, or
4642@code{acc_construct_runtime_api}, respectively.
4643
4644@item
4645Will be @code{acc_construct_enter_data} or
4646@code{acc_construct_exit_data} when processing variable mappings
4647specified in OpenACC @emph{declare} directives; should be
4648@code{acc_construct_declare}.
4649
4650@item
4651For implicit @code{acc_ev_device_init_start},
4652@code{acc_ev_device_init_end}, and explicit as well as implicit
4653@code{acc_ev_alloc}, @code{acc_ev_free},
4654@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
4655@code{acc_ev_enqueue_download_start}, and
4656@code{acc_ev_enqueue_download_end}, will be
4657@code{acc_construct_parallel}; should reflect the real parent
4658construct.
4659
4660@end itemize
4661
4662@item @code{acc_event_info.*.implicit}
4663For @code{acc_ev_alloc}, @code{acc_ev_free},
4664@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
4665@code{acc_ev_enqueue_download_start}, and
4666@code{acc_ev_enqueue_download_end}, this currently will be @code{1}
4667also for explicit usage.
4668
4669@item @code{acc_event_info.data_event.var_name}
4670Always @code{NULL}; not yet implemented.
4671
4672@item @code{acc_event_info.data_event.host_ptr}
4673For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
4674@code{NULL}.
4675
4676@item @code{typedef union acc_api_info}
4677@dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
4678Information}. This should obviously be @code{typedef @emph{struct}
4679acc_api_info}.
4680
4681@item @code{acc_api_info.device_api}
4682Possibly not yet implemented correctly for
4683@code{acc_ev_compute_construct_start},
4684@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
4685will always be @code{acc_device_api_none} for these event types.
4686For @code{acc_ev_enter_data_start}, it will be
4687@code{acc_device_api_none} in some cases.
4688
4689@item @code{acc_api_info.device_type}
4690Always the same as @code{acc_prof_info.device_type}.
4691
4692@item @code{acc_api_info.vendor}
4693Always @code{-1}; not yet implemented.
4694
4695@item @code{acc_api_info.device_handle}
4696Always @code{NULL}; not yet implemented.
4697
4698@item @code{acc_api_info.context_handle}
4699Always @code{NULL}; not yet implemented.
4700
4701@item @code{acc_api_info.async_handle}
4702Always @code{NULL}; not yet implemented.
4703
4704@end table
4705
4706Remarks about certain event types:
4707
4708@table @asis
4709
4710@item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
4711@itemize
4712
4713@item
4714@c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
4715@c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
4716@c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
4717When a compute construct triggers implicit
4718@code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
4719events, they currently aren't @emph{nested within} the corresponding
4720@code{acc_ev_compute_construct_start} and
4721@code{acc_ev_compute_construct_end}, but they're currently observed
4722@emph{before} @code{acc_ev_compute_construct_start}.
4723It's not clear what to do: the standard asks us provide a lot of
4724details to the @code{acc_ev_compute_construct_start} callback, without
4725(implicitly) initializing a device before?
4726
4727@item
4728Callbacks for these event types will not be invoked for calls to the
4729@code{acc_set_device_type} and @code{acc_set_device_num} functions.
4730It's not clear if they should be.
4731
4732@end itemize
4733
4734@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
4735@itemize
4736
4737@item
4738Callbacks for these event types will also be invoked for OpenACC
4739@emph{host_data} constructs.
4740It's not clear if they should be.
4741
4742@item
4743Callbacks for these event types will also be invoked when processing
4744variable mappings specified in OpenACC @emph{declare} directives.
4745It's not clear if they should be.
4746
4747@end itemize
4748
4749@end table
4750
4751Callbacks for the following event types will be invoked, but dispatch
4752and information provided therein has not yet been thoroughly reviewed:
4753
4754@itemize
4755@item @code{acc_ev_alloc}
4756@item @code{acc_ev_free}
4757@item @code{acc_ev_update_start}, @code{acc_ev_update_end}
4758@item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
4759@item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
4760@end itemize
4761
4762During device initialization, and finalization, respectively,
4763callbacks for the following event types will not yet be invoked:
4764
4765@itemize
4766@item @code{acc_ev_alloc}
4767@item @code{acc_ev_free}
4768@end itemize
4769
4770Callbacks for the following event types have not yet been implemented,
4771so currently won't be invoked:
4772
4773@itemize
4774@item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
4775@item @code{acc_ev_runtime_shutdown}
4776@item @code{acc_ev_create}, @code{acc_ev_delete}
4777@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
4778@end itemize
4779
4780For the following runtime library functions, not all expected
4781callbacks will be invoked (mostly concerning implicit device
4782initialization):
4783
4784@itemize
4785@item @code{acc_get_num_devices}
4786@item @code{acc_set_device_type}
4787@item @code{acc_get_device_type}
4788@item @code{acc_set_device_num}
4789@item @code{acc_get_device_num}
4790@item @code{acc_init}
4791@item @code{acc_shutdown}
4792@end itemize
4793
4794Aside from implicit device initialization, for the following runtime
4795library functions, no callbacks will be invoked for shared-memory
4796offloading devices (it's not clear if they should be):
4797
4798@itemize
4799@item @code{acc_malloc}
4800@item @code{acc_free}
4801@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
4802@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
4803@item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
4804@item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
4805@item @code{acc_update_device}, @code{acc_update_device_async}
4806@item @code{acc_update_self}, @code{acc_update_self_async}
4807@item @code{acc_map_data}, @code{acc_unmap_data}
4808@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
4809@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
4810@end itemize
4811
4812@c ---------------------------------------------------------------------
4813@c OpenMP-Implementation Specifics
4814@c ---------------------------------------------------------------------
4815
4816@node OpenMP-Implementation Specifics
4817@chapter OpenMP-Implementation Specifics
4818
4819@menu
2cd0689a 4820* Implementation-defined ICV Initialization::
d77de738 4821* OpenMP Context Selectors::
450b05ce 4822* Memory allocation::
d77de738
ML
4823@end menu
4824
2cd0689a
TB
4825@node Implementation-defined ICV Initialization
4826@section Implementation-defined ICV Initialization
4827@cindex Implementation specific setting
4828
4829@multitable @columnfractions .30 .70
4830@item @var{affinity-format-var} @tab See @ref{OMP_AFFINITY_FORMAT}.
4831@item @var{def-allocator-var} @tab See @ref{OMP_ALLOCATOR}.
4832@item @var{max-active-levels-var} @tab See @ref{OMP_MAX_ACTIVE_LEVELS}.
4833@item @var{dyn-var} @tab See @ref{OMP_DYNAMIC}.
819f3d36 4834@item @var{nthreads-var} @tab See @ref{OMP_NUM_THREADS}.
2cd0689a
TB
4835@item @var{num-devices-var} @tab Number of non-host devices found
4836by GCC's run-time library
4837@item @var{num-procs-var} @tab The number of CPU cores on the
4838initial device, except that affinity settings might lead to a
4839smaller number. On non-host devices, the value of the
4840@var{nthreads-var} ICV.
4841@item @var{place-partition-var} @tab See @ref{OMP_PLACES}.
4842@item @var{run-sched-var} @tab See @ref{OMP_SCHEDULE}.
4843@item @var{stacksize-var} @tab See @ref{OMP_STACKSIZE}.
4844@item @var{thread-limit-var} @tab See @ref{OMP_TEAMS_THREAD_LIMIT}
4845@item @var{wait-policy-var} @tab See @ref{OMP_WAIT_POLICY} and
4846@ref{GOMP_SPINCOUNT}
4847@end multitable
4848
d77de738
ML
4849@node OpenMP Context Selectors
4850@section OpenMP Context Selectors
4851
4852@code{vendor} is always @code{gnu}. References are to the GCC manual.
4853
4854@multitable @columnfractions .60 .10 .25
4855@headitem @code{arch} @tab @code{kind} @tab @code{isa}
4856@item @code{x86}, @code{x86_64}, @code{i386}, @code{i486},
4857 @code{i586}, @code{i686}, @code{ia32}
4858 @tab @code{host}
4859 @tab See @code{-m...} flags in ``x86 Options'' (without @code{-m})
4860@item @code{amdgcn}, @code{gcn}
4861 @tab @code{gpu}
e0b95c2e
TB
4862 @tab See @code{-march=} in ``AMD GCN Options''@footnote{Additionally,
4863 @code{gfx803} is supported as an alias for @code{fiji}.}
d77de738
ML
4864@item @code{nvptx}
4865 @tab @code{gpu}
4866 @tab See @code{-march=} in ``Nvidia PTX Options''
4867@end multitable
4868
450b05ce
TB
4869@node Memory allocation
4870@section Memory allocation
d77de738 4871
a85a106c
TB
4872For the available predefined allocators and, as applicable, their associated
4873predefined memory spaces and for the available traits and their default values,
4874see @ref{OMP_ALLOCATOR}. Predefined allocators without an associated memory
4875space use the @code{omp_default_mem_space} memory space.
4876
8c2fc744
TB
4877For the memory spaces, the following applies:
4878@itemize
4879@item @code{omp_default_mem_space} is supported
4880@item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
4881@item @code{omp_low_lat_mem_space} maps to @code{omp_default_mem_space}
4882@item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space},
4883 unless the memkind library is available
4884@item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
4885 unless the memkind library is available
4886@end itemize
4887
d77de738
ML
4888On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
4889library} (@code{libmemkind.so.0}) is available at runtime, it is used when
4890creating memory allocators requesting
4891
4892@itemize
4893@item the memory space @code{omp_high_bw_mem_space}
4894@item the memory space @code{omp_large_cap_mem_space}
450b05ce 4895@item the @code{partition} trait @code{interleaved}; note that for
8c2fc744 4896 @code{omp_large_cap_mem_space} the allocation will not be interleaved
d77de738
ML
4897@end itemize
4898
450b05ce
TB
4899On Linux systems, where the @uref{https://github.com/numactl/numactl, numa
4900library} (@code{libnuma.so.1}) is available at runtime, it used when creating
4901memory allocators requesting
4902
4903@itemize
4904@item the @code{partition} trait @code{nearest}, except when both the
4905libmemkind library is available and the memory space is either
4906@code{omp_large_cap_mem_space} or @code{omp_high_bw_mem_space}
4907@end itemize
4908
4909Note that the numa library will round up the allocation size to a multiple of
4910the system page size; therefore, consider using it only with large data or
4911by sharing allocations via the @code{pool_size} trait. Furthermore, the Linux
4912kernel does not guarantee that an allocation will always be on the nearest NUMA
4913node nor that after reallocation the same node will be used. Note additionally
4914that, on Linux, the default setting of the memory placement policy is to use the
4915current node; therefore, unless the memory placement policy has been overridden,
4916the @code{partition} trait @code{environment} (the default) will be effectively
4917a @code{nearest} allocation.
4918
a85a106c 4919Additional notes regarding the traits:
8c2fc744
TB
4920@itemize
4921@item The @code{pinned} trait is unsupported.
a85a106c
TB
4922@item The default for the @code{pool_size} trait is no pool and for every
4923 (re)allocation the associated library routine is called, which might
4924 internally use a memory pool.
8c2fc744
TB
4925@item For the @code{partition} trait, the partition part size will be the same
4926 as the requested size (i.e. @code{interleaved} or @code{blocked} has no
4927 effect), except for @code{interleaved} when the memkind library is
450b05ce
TB
4928 available. Furthermore, for @code{nearest} and unless the numa library
4929 is available, the memory might not be on the same NUMA node as thread
4930 that allocated the memory; on Linux, this is in particular the case when
4931 the memory placement policy is set to preferred.
8c2fc744
TB
4932@item The @code{access} trait has no effect such that memory is always
4933 accessible by all threads.
4934@item The @code{sync_hint} trait has no effect.
4935@end itemize
d77de738
ML
4936
4937@c ---------------------------------------------------------------------
4938@c Offload-Target Specifics
4939@c ---------------------------------------------------------------------
4940
4941@node Offload-Target Specifics
4942@chapter Offload-Target Specifics
4943
4944The following sections present notes on the offload-target specifics
4945
4946@menu
4947* AMD Radeon::
4948* nvptx::
4949@end menu
4950
4951@node AMD Radeon
4952@section AMD Radeon (GCN)
4953
4954On the hardware side, there is the hierarchy (fine to coarse):
4955@itemize
4956@item work item (thread)
4957@item wavefront
4958@item work group
81476bc4 4959@item compute unit (CU)
d77de738
ML
4960@end itemize
4961
4962All OpenMP and OpenACC levels are used, i.e.
4963@itemize
4964@item OpenMP's simd and OpenACC's vector map to work items (thread)
4965@item OpenMP's threads (``parallel'') and OpenACC's workers map
4966 to wavefronts
4967@item OpenMP's teams and OpenACC's gang use a threadpool with the
4968 size of the number of teams or gangs, respectively.
4969@end itemize
4970
4971The used sizes are
4972@itemize
4973@item Number of teams is the specified @code{num_teams} (OpenMP) or
81476bc4
MV
4974 @code{num_gangs} (OpenACC) or otherwise the number of CU. It is limited
4975 by two times the number of CU.
d77de738
ML
4976@item Number of wavefronts is 4 for gfx900 and 16 otherwise;
4977 @code{num_threads} (OpenMP) and @code{num_workers} (OpenACC)
4978 overrides this if smaller.
4979@item The wavefront has 102 scalars and 64 vectors
4980@item Number of workitems is always 64
4981@item The hardware permits maximally 40 workgroups/CU and
4982 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU.
4983@item 80 scalars registers and 24 vector registers in non-kernel functions
4984 (the chosen procedure-calling API).
4985@item For the kernel itself: as many as register pressure demands (number of
4986 teams and number of threads, scaled down if registers are exhausted)
4987@end itemize
4988
4989The implementation remark:
4990@itemize
4991@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
4992 using the C library @code{printf} functions and the Fortran
4993 @code{print}/@code{write} statements.
243fa488 4994@item Reverse offload regions (i.e. @code{target} regions with
f84fdb13
TB
4995 @code{device(ancestor:1)}) are processed serially per @code{target} region
4996 such that the next reverse offload region is only executed after the previous
4997 one returned.
f1af7d65 4998@item OpenMP code that has a @code{requires} directive with
f84fdb13
TB
4999 @code{unified_shared_memory} will remove any GCN device from the list of
5000 available devices (``host fallback'').
2e3dd14d
TB
5001@item The available stack size can be changed using the @code{GCN_STACK_SIZE}
5002 environment variable; the default is 32 kiB per thread.
d77de738
ML
5003@end itemize
5004
5005
5006
5007@node nvptx
5008@section nvptx
5009
5010On the hardware side, there is the hierarchy (fine to coarse):
5011@itemize
5012@item thread
5013@item warp
5014@item thread block
5015@item streaming multiprocessor
5016@end itemize
5017
5018All OpenMP and OpenACC levels are used, i.e.
5019@itemize
5020@item OpenMP's simd and OpenACC's vector map to threads
5021@item OpenMP's threads (``parallel'') and OpenACC's workers map to warps
5022@item OpenMP's teams and OpenACC's gang use a threadpool with the
5023 size of the number of teams or gangs, respectively.
5024@end itemize
5025
5026The used sizes are
5027@itemize
5028@item The @code{warp_size} is always 32
5029@item CUDA kernel launched: @code{dim=@{#teams,1,1@}, blocks=@{#threads,warp_size,1@}}.
81476bc4
MV
5030@item The number of teams is limited by the number of blocks the device can
5031 host simultaneously.
d77de738
ML
5032@end itemize
5033
5034Additional information can be obtained by setting the environment variable to
5035@code{GOMP_DEBUG=1} (very verbose; grep for @code{kernel.*launch} for launch
5036parameters).
5037
5038GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA,
5039which caches the JIT in the user's directory (see CUDA documentation; can be
5040tuned by the environment variables @code{CUDA_CACHE_@{DISABLE,MAXSIZE,PATH@}}.
5041
5042Note: While PTX ISA is generic, the @code{-mptx=} and @code{-march=} commandline
eda38850 5043options still affect the used PTX ISA code and, thus, the requirements on
d77de738
ML
5044CUDA version and hardware.
5045
5046The implementation remark:
5047@itemize
5048@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
5049 using the C library @code{printf} functions. Note that the Fortran
5050 @code{print}/@code{write} statements are not supported, yet.
5051@item Compilation OpenMP code that contains @code{requires reverse_offload}
5052 requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
5053 is not supported.
eda38850
TB
5054@item For code containing reverse offload (i.e. @code{target} regions with
5055 @code{device(ancestor:1)}), there is a slight performance penalty
5056 for @emph{all} target regions, consisting mostly of shutdown delay
5057 Per device, reverse offload regions are processed serially such that
5058 the next reverse offload region is only executed after the previous
5059 one returned.
f1af7d65
TB
5060@item OpenMP code that has a @code{requires} directive with
5061 @code{unified_shared_memory} will remove any nvptx device from the
eda38850 5062 list of available devices (``host fallback'').
2cd0689a
TB
5063@item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
5064 in the GCC manual.
25072a47
TB
5065@item The OpenMP routines @code{omp_target_memcpy_rect} and
5066 @code{omp_target_memcpy_rect_async} and the @code{target update}
5067 directive for non-contiguous list items will use the 2D and 3D
5068 memory-copy functions of the CUDA library. Higher dimensions will
5069 call those functions in a loop and are therefore supported.
d77de738
ML
5070@end itemize
5071
5072
5073@c ---------------------------------------------------------------------
5074@c The libgomp ABI
5075@c ---------------------------------------------------------------------
5076
5077@node The libgomp ABI
5078@chapter The libgomp ABI
5079
5080The following sections present notes on the external ABI as
5081presented by libgomp. Only maintainers should need them.
5082
5083@menu
5084* Implementing MASTER construct::
5085* Implementing CRITICAL construct::
5086* Implementing ATOMIC construct::
5087* Implementing FLUSH construct::
5088* Implementing BARRIER construct::
5089* Implementing THREADPRIVATE construct::
5090* Implementing PRIVATE clause::
5091* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
5092* Implementing REDUCTION clause::
5093* Implementing PARALLEL construct::
5094* Implementing FOR construct::
5095* Implementing ORDERED construct::
5096* Implementing SECTIONS construct::
5097* Implementing SINGLE construct::
5098* Implementing OpenACC's PARALLEL construct::
5099@end menu
5100
5101
5102@node Implementing MASTER construct
5103@section Implementing MASTER construct
5104
5105@smallexample
5106if (omp_get_thread_num () == 0)
5107 block
5108@end smallexample
5109
5110Alternately, we generate two copies of the parallel subfunction
5111and only include this in the version run by the primary thread.
5112Surely this is not worthwhile though...
5113
5114
5115
5116@node Implementing CRITICAL construct
5117@section Implementing CRITICAL construct
5118
5119Without a specified name,
5120
5121@smallexample
5122 void GOMP_critical_start (void);
5123 void GOMP_critical_end (void);
5124@end smallexample
5125
5126so that we don't get COPY relocations from libgomp to the main
5127application.
5128
5129With a specified name, use omp_set_lock and omp_unset_lock with
5130name being transformed into a variable declared like
5131
5132@smallexample
5133 omp_lock_t gomp_critical_user_<name> __attribute__((common))
5134@end smallexample
5135
5136Ideally the ABI would specify that all zero is a valid unlocked
5137state, and so we wouldn't need to initialize this at
5138startup.
5139
5140
5141
5142@node Implementing ATOMIC construct
5143@section Implementing ATOMIC construct
5144
5145The target should implement the @code{__sync} builtins.
5146
5147Failing that we could add
5148
5149@smallexample
5150 void GOMP_atomic_enter (void)
5151 void GOMP_atomic_exit (void)
5152@end smallexample
5153
5154which reuses the regular lock code, but with yet another lock
5155object private to the library.
5156
5157
5158
5159@node Implementing FLUSH construct
5160@section Implementing FLUSH construct
5161
5162Expands to the @code{__sync_synchronize} builtin.
5163
5164
5165
5166@node Implementing BARRIER construct
5167@section Implementing BARRIER construct
5168
5169@smallexample
5170 void GOMP_barrier (void)
5171@end smallexample
5172
5173
5174@node Implementing THREADPRIVATE construct
5175@section Implementing THREADPRIVATE construct
5176
5177In _most_ cases we can map this directly to @code{__thread}. Except
5178that OMP allows constructors for C++ objects. We can either
5179refuse to support this (how often is it used?) or we can
5180implement something akin to .ctors.
5181
5182Even more ideally, this ctor feature is handled by extensions
5183to the main pthreads library. Failing that, we can have a set
5184of entry points to register ctor functions to be called.
5185
5186
5187
5188@node Implementing PRIVATE clause
5189@section Implementing PRIVATE clause
5190
5191In association with a PARALLEL, or within the lexical extent
5192of a PARALLEL block, the variable becomes a local variable in
5193the parallel subfunction.
5194
5195In association with FOR or SECTIONS blocks, create a new
5196automatic variable within the current function. This preserves
5197the semantic of new variable creation.
5198
5199
5200
5201@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
5202@section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
5203
5204This seems simple enough for PARALLEL blocks. Create a private
5205struct for communicating between the parent and subfunction.
5206In the parent, copy in values for scalar and "small" structs;
5207copy in addresses for others TREE_ADDRESSABLE types. In the
5208subfunction, copy the value into the local variable.
5209
5210It is not clear what to do with bare FOR or SECTION blocks.
5211The only thing I can figure is that we do something like:
5212
5213@smallexample
5214#pragma omp for firstprivate(x) lastprivate(y)
5215for (int i = 0; i < n; ++i)
5216 body;
5217@end smallexample
5218
5219which becomes
5220
5221@smallexample
5222@{
5223 int x = x, y;
5224
5225 // for stuff
5226
5227 if (i == n)
5228 y = y;
5229@}
5230@end smallexample
5231
5232where the "x=x" and "y=y" assignments actually have different
5233uids for the two variables, i.e. not something you could write
5234directly in C. Presumably this only makes sense if the "outer"
5235x and y are global variables.
5236
5237COPYPRIVATE would work the same way, except the structure
5238broadcast would have to happen via SINGLE machinery instead.
5239
5240
5241
5242@node Implementing REDUCTION clause
5243@section Implementing REDUCTION clause
5244
5245The private struct mentioned in the previous section should have
5246a pointer to an array of the type of the variable, indexed by the
5247thread's @var{team_id}. The thread stores its final value into the
5248array, and after the barrier, the primary thread iterates over the
5249array to collect the values.
5250
5251
5252@node Implementing PARALLEL construct
5253@section Implementing PARALLEL construct
5254
5255@smallexample
5256 #pragma omp parallel
5257 @{
5258 body;
5259 @}
5260@end smallexample
5261
5262becomes
5263
5264@smallexample
5265 void subfunction (void *data)
5266 @{
5267 use data;
5268 body;
5269 @}
5270
5271 setup data;
5272 GOMP_parallel_start (subfunction, &data, num_threads);
5273 subfunction (&data);
5274 GOMP_parallel_end ();
5275@end smallexample
5276
5277@smallexample
5278 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
5279@end smallexample
5280
5281The @var{FN} argument is the subfunction to be run in parallel.
5282
5283The @var{DATA} argument is a pointer to a structure used to
5284communicate data in and out of the subfunction, as discussed
5285above with respect to FIRSTPRIVATE et al.
5286
5287The @var{NUM_THREADS} argument is 1 if an IF clause is present
5288and false, or the value of the NUM_THREADS clause, if
5289present, or 0.
5290
5291The function needs to create the appropriate number of
5292threads and/or launch them from the dock. It needs to
5293create the team structure and assign team ids.
5294
5295@smallexample
5296 void GOMP_parallel_end (void)
5297@end smallexample
5298
5299Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
5300
5301
5302
5303@node Implementing FOR construct
5304@section Implementing FOR construct
5305
5306@smallexample
5307 #pragma omp parallel for
5308 for (i = lb; i <= ub; i++)
5309 body;
5310@end smallexample
5311
5312becomes
5313
5314@smallexample
5315 void subfunction (void *data)
5316 @{
5317 long _s0, _e0;
5318 while (GOMP_loop_static_next (&_s0, &_e0))
5319 @{
5320 long _e1 = _e0, i;
5321 for (i = _s0; i < _e1; i++)
5322 body;
5323 @}
5324 GOMP_loop_end_nowait ();
5325 @}
5326
5327 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
5328 subfunction (NULL);
5329 GOMP_parallel_end ();
5330@end smallexample
5331
5332@smallexample
5333 #pragma omp for schedule(runtime)
5334 for (i = 0; i < n; i++)
5335 body;
5336@end smallexample
5337
5338becomes
5339
5340@smallexample
5341 @{
5342 long i, _s0, _e0;
5343 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
5344 do @{
5345 long _e1 = _e0;
5346 for (i = _s0, i < _e0; i++)
5347 body;
5348 @} while (GOMP_loop_runtime_next (&_s0, _&e0));
5349 GOMP_loop_end ();
5350 @}
5351@end smallexample
5352
5353Note that while it looks like there is trickiness to propagating
5354a non-constant STEP, there isn't really. We're explicitly allowed
5355to evaluate it as many times as we want, and any variables involved
5356should automatically be handled as PRIVATE or SHARED like any other
5357variables. So the expression should remain evaluable in the
5358subfunction. We can also pull it into a local variable if we like,
5359but since its supposed to remain unchanged, we can also not if we like.
5360
5361If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
5362able to get away with no work-sharing context at all, since we can
5363simply perform the arithmetic directly in each thread to divide up
5364the iterations. Which would mean that we wouldn't need to call any
5365of these routines.
5366
5367There are separate routines for handling loops with an ORDERED
5368clause. Bookkeeping for that is non-trivial...
5369
5370
5371
5372@node Implementing ORDERED construct
5373@section Implementing ORDERED construct
5374
5375@smallexample
5376 void GOMP_ordered_start (void)
5377 void GOMP_ordered_end (void)
5378@end smallexample
5379
5380
5381
5382@node Implementing SECTIONS construct
5383@section Implementing SECTIONS construct
5384
5385A block as
5386
5387@smallexample
5388 #pragma omp sections
5389 @{
5390 #pragma omp section
5391 stmt1;
5392 #pragma omp section
5393 stmt2;
5394 #pragma omp section
5395 stmt3;
5396 @}
5397@end smallexample
5398
5399becomes
5400
5401@smallexample
5402 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
5403 switch (i)
5404 @{
5405 case 1:
5406 stmt1;
5407 break;
5408 case 2:
5409 stmt2;
5410 break;
5411 case 3:
5412 stmt3;
5413 break;
5414 @}
5415 GOMP_barrier ();
5416@end smallexample
5417
5418
5419@node Implementing SINGLE construct
5420@section Implementing SINGLE construct
5421
5422A block like
5423
5424@smallexample
5425 #pragma omp single
5426 @{
5427 body;
5428 @}
5429@end smallexample
5430
5431becomes
5432
5433@smallexample
5434 if (GOMP_single_start ())
5435 body;
5436 GOMP_barrier ();
5437@end smallexample
5438
5439while
5440
5441@smallexample
5442 #pragma omp single copyprivate(x)
5443 body;
5444@end smallexample
5445
5446becomes
5447
5448@smallexample
5449 datap = GOMP_single_copy_start ();
5450 if (datap == NULL)
5451 @{
5452 body;
5453 data.x = x;
5454 GOMP_single_copy_end (&data);
5455 @}
5456 else
5457 x = datap->x;
5458 GOMP_barrier ();
5459@end smallexample
5460
5461
5462
5463@node Implementing OpenACC's PARALLEL construct
5464@section Implementing OpenACC's PARALLEL construct
5465
5466@smallexample
5467 void GOACC_parallel ()
5468@end smallexample
5469
5470
5471
5472@c ---------------------------------------------------------------------
5473@c Reporting Bugs
5474@c ---------------------------------------------------------------------
5475
5476@node Reporting Bugs
5477@chapter Reporting Bugs
5478
5479Bugs in the GNU Offloading and Multi Processing Runtime Library should
5480be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
5481"openacc", or "openmp", or both to the keywords field in the bug
5482report, as appropriate.
5483
5484
5485
5486@c ---------------------------------------------------------------------
5487@c GNU General Public License
5488@c ---------------------------------------------------------------------
5489
5490@include gpl_v3.texi
5491
5492
5493
5494@c ---------------------------------------------------------------------
5495@c GNU Free Documentation License
5496@c ---------------------------------------------------------------------
5497
5498@include fdl.texi
5499
5500
5501
5502@c ---------------------------------------------------------------------
5503@c Funding Free Software
5504@c ---------------------------------------------------------------------
5505
5506@include funding.texi
5507
5508@c ---------------------------------------------------------------------
5509@c Index
5510@c ---------------------------------------------------------------------
5511
5512@node Library Index
5513@unnumbered Library Index
5514
5515@printindex cp
5516
5517@bye