]> git.ipfire.org Git - thirdparty/gcc.git/blame - libgomp/libgomp.texi
doc: Remove obsolete sentence about _Float* not being supported in C++ [PR106652]
[thirdparty/gcc.git] / libgomp / libgomp.texi
CommitLineData
d77de738
ML
1\input texinfo @c -*-texinfo-*-
2
3@c %**start of header
4@setfilename libgomp.info
5@settitle GNU libgomp
6@c %**end of header
7
8
9@copying
74d5206f 10Copyright @copyright{} 2006-2023 Free Software Foundation, Inc.
d77de738
ML
11
12Permission is granted to copy, distribute and/or modify this document
13under the terms of the GNU Free Documentation License, Version 1.3 or
14any later version published by the Free Software Foundation; with the
15Invariant Sections being ``Funding Free Software'', the Front-Cover
16texts being (a) (see below), and with the Back-Cover Texts being (b)
17(see below). A copy of the license is included in the section entitled
18``GNU Free Documentation License''.
19
20(a) The FSF's Front-Cover Text is:
21
22 A GNU Manual
23
24(b) The FSF's Back-Cover Text is:
25
26 You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development.
29@end copying
30
31@ifinfo
32@dircategory GNU Libraries
33@direntry
34* libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
35@end direntry
36
37This manual documents libgomp, the GNU Offloading and Multi Processing
38Runtime library. This is the GNU implementation of the OpenMP and
39OpenACC APIs for parallel and accelerator programming in C/C++ and
40Fortran.
41
42Published by the Free Software Foundation
4351 Franklin Street, Fifth Floor
44Boston, MA 02110-1301 USA
45
46@insertcopying
47@end ifinfo
48
49
50@setchapternewpage odd
51
52@titlepage
53@title GNU Offloading and Multi Processing Runtime Library
54@subtitle The GNU OpenMP and OpenACC Implementation
55@page
56@vskip 0pt plus 1filll
57@comment For the @value{version-GCC} Version*
58@sp 1
59Published by the Free Software Foundation @*
6051 Franklin Street, Fifth Floor@*
61Boston, MA 02110-1301, USA@*
62@sp 1
63@insertcopying
64@end titlepage
65
66@summarycontents
67@contents
68@page
69
70
71@node Top, Enabling OpenMP
72@top Introduction
73@cindex Introduction
74
75This manual documents the usage of libgomp, the GNU Offloading and
76Multi Processing Runtime Library. This includes the GNU
77implementation of the @uref{https://www.openmp.org, OpenMP} Application
78Programming Interface (API) for multi-platform shared-memory parallel
79programming in C/C++ and Fortran, and the GNU implementation of the
80@uref{https://www.openacc.org, OpenACC} Application Programming
81Interface (API) for offloading of code to accelerator devices in C/C++
82and Fortran.
83
84Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
85on this, support for OpenACC and offloading (both OpenACC and OpenMP
864's target construct) has been added later on, and the library's name
87changed to GNU Offloading and Multi Processing Runtime Library.
88
89
90
91@comment
92@comment When you add a new menu item, please keep the right hand
93@comment aligned to the same column. Do not use tabs. This provides
94@comment better formatting.
95@comment
96@menu
97* Enabling OpenMP:: How to enable OpenMP for your applications.
98* OpenMP Implementation Status:: List of implemented features by OpenMP version
99* OpenMP Runtime Library Routines: Runtime Library Routines.
100 The OpenMP runtime application programming
101 interface.
102* OpenMP Environment Variables: Environment Variables.
103 Influencing OpenMP runtime behavior with
104 environment variables.
105* Enabling OpenACC:: How to enable OpenACC for your
106 applications.
107* OpenACC Runtime Library Routines:: The OpenACC runtime application
108 programming interface.
109* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
110 environment variables.
111* CUDA Streams Usage:: Notes on the implementation of
112 asynchronous operations.
113* OpenACC Library Interoperability:: OpenACC library interoperability with the
114 NVIDIA CUBLAS library.
115* OpenACC Profiling Interface::
116* OpenMP-Implementation Specifics:: Notes specifics of this OpenMP
117 implementation
118* Offload-Target Specifics:: Notes on offload-target specific internals
119* The libgomp ABI:: Notes on the external ABI presented by libgomp.
120* Reporting Bugs:: How to report bugs in the GNU Offloading and
121 Multi Processing Runtime Library.
122* Copying:: GNU general public license says
123 how you can copy and share libgomp.
124* GNU Free Documentation License::
125 How you can copy and share this manual.
126* Funding:: How to help assure continued work for free
127 software.
128* Library Index:: Index of this documentation.
129@end menu
130
131
132@c ---------------------------------------------------------------------
133@c Enabling OpenMP
134@c ---------------------------------------------------------------------
135
136@node Enabling OpenMP
137@chapter Enabling OpenMP
138
139To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
140flag @command{-fopenmp} must be specified. This enables the OpenMP directive
141@code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
142@code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
143@code{!$} conditional compilation sentinels in free form and @code{c$},
144@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
145arranges for automatic linking of the OpenMP runtime library
146(@ref{Runtime Library Routines}).
147
148A complete description of all OpenMP directives may be found in the
149@uref{https://www.openmp.org, OpenMP Application Program Interface} manuals.
150See also @ref{OpenMP Implementation Status}.
151
152
153@c ---------------------------------------------------------------------
154@c OpenMP Implementation Status
155@c ---------------------------------------------------------------------
156
157@node OpenMP Implementation Status
158@chapter OpenMP Implementation Status
159
160@menu
161* OpenMP 4.5:: Feature completion status to 4.5 specification
162* OpenMP 5.0:: Feature completion status to 5.0 specification
163* OpenMP 5.1:: Feature completion status to 5.1 specification
164* OpenMP 5.2:: Feature completion status to 5.2 specification
c16e85d7 165* OpenMP Technical Report 11:: Feature completion status to first 6.0 preview
d77de738
ML
166@end menu
167
168The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version}
169parameter, provided by @code{omp_lib.h} and the @code{omp_lib} module, have
170the value @code{201511} (i.e. OpenMP 4.5).
171
172@node OpenMP 4.5
173@section OpenMP 4.5
174
175The OpenMP 4.5 specification is fully supported.
176
177@node OpenMP 5.0
178@section OpenMP 5.0
179
180@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
181@c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2
182
183@multitable @columnfractions .60 .10 .25
184@headitem Description @tab Status @tab Comments
185@item Array shaping @tab N @tab
186@item Array sections with non-unit strides in C and C++ @tab N @tab
187@item Iterators @tab Y @tab
188@item @code{metadirective} directive @tab N @tab
189@item @code{declare variant} directive
190 @tab P @tab @emph{simd} traits not handled correctly
2cd0689a 191@item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
d77de738 192 env variable @tab Y @tab
2cd0689a 193@item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
d77de738 194@item @code{requires} directive @tab P
8c2fc744 195 @tab complete but no non-host device provides @code{unified_shared_memory}
d77de738 196@item @code{teams} construct outside an enclosing target region @tab Y @tab
85da0b40
TB
197@item Non-rectangular loop nests @tab P
198 @tab Full support for C/C++, partial for Fortran
199 (@uref{https://gcc.gnu.org/PR110735,PR110735})
d77de738
ML
200@item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
201@item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
202 constructs @tab Y @tab
203@item Collapse of associated loops that are imperfectly nested loops @tab N @tab
204@item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in
205 @code{simd} construct @tab Y @tab
206@item @code{atomic} constructs in @code{simd} @tab Y @tab
207@item @code{loop} construct @tab Y @tab
208@item @code{order(concurrent)} clause @tab Y @tab
209@item @code{scan} directive and @code{in_scan} modifier for the
210 @code{reduction} clause @tab Y @tab
211@item @code{in_reduction} clause on @code{task} constructs @tab Y @tab
212@item @code{in_reduction} clause on @code{target} constructs @tab P
213 @tab @code{nowait} only stub
214@item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab
215@item @code{task} modifier to @code{reduction} clause @tab Y @tab
216@item @code{affinity} clause to @code{task} construct @tab Y @tab Stub only
217@item @code{detach} clause to @code{task} construct @tab Y @tab
218@item @code{omp_fulfill_event} runtime routine @tab Y @tab
219@item @code{reduction} and @code{in_reduction} clauses on @code{taskloop}
220 and @code{taskloop simd} constructs @tab Y @tab
221@item @code{taskloop} construct cancelable by @code{cancel} construct
222 @tab Y @tab
223@item @code{mutexinoutset} @emph{dependence-type} for @code{depend} clause
224 @tab Y @tab
225@item Predefined memory spaces, memory allocators, allocator traits
13c3e29d 226 @tab Y @tab See also @ref{Memory allocation}
d77de738
ML
227@item Memory management routines @tab Y @tab
228@item @code{allocate} directive @tab N @tab
229@item @code{allocate} clause @tab P @tab Initial support
230@item @code{use_device_addr} clause on @code{target data} @tab Y @tab
f84fdb13 231@item @code{ancestor} modifier on @code{device} clause @tab Y @tab
d77de738
ML
232@item Implicit declare target directive @tab Y @tab
233@item Discontiguous array section with @code{target update} construct
234 @tab N @tab
235@item C/C++'s lvalue expressions in @code{to}, @code{from}
236 and @code{map} clauses @tab N @tab
237@item C/C++'s lvalue expressions in @code{depend} clauses @tab Y @tab
238@item Nested @code{declare target} directive @tab Y @tab
239@item Combined @code{master} constructs @tab Y @tab
240@item @code{depend} clause on @code{taskwait} @tab Y @tab
241@item Weak memory ordering clauses on @code{atomic} and @code{flush} construct
242 @tab Y @tab
243@item @code{hint} clause on the @code{atomic} construct @tab Y @tab Stub only
244@item @code{depobj} construct and depend objects @tab Y @tab
245@item Lock hints were renamed to synchronization hints @tab Y @tab
246@item @code{conditional} modifier to @code{lastprivate} clause @tab Y @tab
247@item Map-order clarifications @tab P @tab
248@item @code{close} @emph{map-type-modifier} @tab Y @tab
249@item Mapping C/C++ pointer variables and to assign the address of
250 device memory mapped by an array section @tab P @tab
251@item Mapping of Fortran pointer and allocatable variables, including pointer
252 and allocatable components of variables
253 @tab P @tab Mapping of vars with allocatable components unsupported
254@item @code{defaultmap} extensions @tab Y @tab
255@item @code{declare mapper} directive @tab N @tab
256@item @code{omp_get_supported_active_levels} routine @tab Y @tab
257@item Runtime routines and environment variables to display runtime thread
258 affinity information @tab Y @tab
259@item @code{omp_pause_resource} and @code{omp_pause_resource_all} runtime
260 routines @tab Y @tab
261@item @code{omp_get_device_num} runtime routine @tab Y @tab
262@item OMPT interface @tab N @tab
263@item OMPD interface @tab N @tab
264@end multitable
265
266@unnumberedsubsec Other new OpenMP 5.0 features
267
268@multitable @columnfractions .60 .10 .25
269@headitem Description @tab Status @tab Comments
270@item Supporting C++'s range-based for loop @tab Y @tab
271@end multitable
272
273
274@node OpenMP 5.1
275@section OpenMP 5.1
276
277@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
278
279@multitable @columnfractions .60 .10 .25
280@headitem Description @tab Status @tab Comments
281@item OpenMP directive as C++ attribute specifiers @tab Y @tab
282@item @code{omp_all_memory} reserved locator @tab Y @tab
283@item @emph{target_device trait} in OpenMP Context @tab N @tab
284@item @code{target_device} selector set in context selectors @tab N @tab
285@item C/C++'s @code{declare variant} directive: elision support of
286 preprocessed code @tab N @tab
287@item @code{declare variant}: new clauses @code{adjust_args} and
288 @code{append_args} @tab N @tab
289@item @code{dispatch} construct @tab N @tab
290@item device-specific ICV settings with environment variables @tab Y @tab
eda38850 291@item @code{assume} and @code{assumes} directives @tab Y @tab
d77de738
ML
292@item @code{nothing} directive @tab Y @tab
293@item @code{error} directive @tab Y @tab
294@item @code{masked} construct @tab Y @tab
295@item @code{scope} directive @tab Y @tab
296@item Loop transformation constructs @tab N @tab
297@item @code{strict} modifier in the @code{grainsize} and @code{num_tasks}
298 clauses of the @code{taskloop} construct @tab Y @tab
b2e1c49b
TB
299@item @code{align} clause in @code{allocate} directive @tab N @tab
300@item @code{align} modifier in @code{allocate} clause @tab Y @tab
d77de738
ML
301@item @code{thread_limit} clause to @code{target} construct @tab Y @tab
302@item @code{has_device_addr} clause to @code{target} construct @tab Y @tab
303@item Iterators in @code{target update} motion clauses and @code{map}
304 clauses @tab N @tab
305@item Indirect calls to the device version of a procedure or function in
306 @code{target} regions @tab N @tab
307@item @code{interop} directive @tab N @tab
308@item @code{omp_interop_t} object support in runtime routines @tab N @tab
309@item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
310@item Extensions to the @code{atomic} directive @tab Y @tab
311@item @code{seq_cst} clause on a @code{flush} construct @tab Y @tab
312@item @code{inoutset} argument to the @code{depend} clause @tab Y @tab
313@item @code{private} and @code{firstprivate} argument to @code{default}
314 clause in C and C++ @tab Y @tab
4ede915d 315@item @code{present} argument to @code{defaultmap} clause @tab Y @tab
d77de738
ML
316@item @code{omp_set_num_teams}, @code{omp_set_teams_thread_limit},
317 @code{omp_get_max_teams}, @code{omp_get_teams_thread_limit} runtime
318 routines @tab Y @tab
319@item @code{omp_target_is_accessible} runtime routine @tab Y @tab
320@item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
321 runtime routines @tab Y @tab
322@item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
323@item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
324 @code{omp_aligned_calloc} runtime routines @tab Y @tab
325@item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
326 @code{omp_atv_default} changed @tab Y @tab
327@item @code{omp_display_env} runtime routine @tab Y @tab
328@item @code{ompt_scope_endpoint_t} enum: @code{ompt_scope_beginend} @tab N @tab
329@item @code{ompt_sync_region_t} enum additions @tab N @tab
330@item @code{ompt_state_t} enum: @code{ompt_state_wait_barrier_implementation}
331 and @code{ompt_state_wait_barrier_teams} @tab N @tab
332@item @code{ompt_callback_target_data_op_emi_t},
333 @code{ompt_callback_target_emi_t}, @code{ompt_callback_target_map_emi_t}
334 and @code{ompt_callback_target_submit_emi_t} @tab N @tab
335@item @code{ompt_callback_error_t} type @tab N @tab
336@item @code{OMP_PLACES} syntax extensions @tab Y @tab
337@item @code{OMP_NUM_TEAMS} and @code{OMP_TEAMS_THREAD_LIMIT} environment
338 variables @tab Y @tab
339@end multitable
340
341@unnumberedsubsec Other new OpenMP 5.1 features
342
343@multitable @columnfractions .60 .10 .25
344@headitem Description @tab Status @tab Comments
345@item Support of strictly structured blocks in Fortran @tab Y @tab
346@item Support of structured block sequences in C/C++ @tab Y @tab
347@item @code{unconstrained} and @code{reproducible} modifiers on @code{order}
348 clause @tab Y @tab
349@item Support @code{begin/end declare target} syntax in C/C++ @tab Y @tab
350@item Pointer predetermined firstprivate getting initialized
351to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
352@item For Fortran, diagnose placing declarative before/between @code{USE},
353 @code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
eda38850 354@item Optional comma between directive and clause in the @code{#pragma} form @tab Y @tab
c16e85d7
TB
355@item @code{indirect} clause in @code{declare target} @tab N @tab
356@item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab
4ede915d
TB
357@item @code{present} modifier to the @code{map}, @code{to} and @code{from}
358 clauses @tab Y @tab
d77de738
ML
359@end multitable
360
361
362@node OpenMP 5.2
363@section OpenMP 5.2
364
365@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
366
367@multitable @columnfractions .60 .10 .25
368@headitem Description @tab Status @tab Comments
2cd0689a 369@item @code{omp_in_explicit_task} routine and @var{explicit-task-var} ICV
d77de738
ML
370 @tab Y @tab
371@item @code{omp}/@code{ompx}/@code{omx} sentinels and @code{omp_}/@code{ompx_}
372 namespaces @tab N/A
373 @tab warning for @code{ompx/omx} sentinels@footnote{The @code{ompx}
374 sentinel as C/C++ pragma and C++ attributes are warned for with
375 @code{-Wunknown-pragmas} (implied by @code{-Wall}) and @code{-Wattributes}
376 (enabled by default), respectively; for Fortran free-source code, there is
377 a warning enabled by default and, for fixed-source code, the @code{omx}
378 sentinel is warned for with with @code{-Wsurprising} (enabled by
379 @code{-Wall}). Unknown clauses are always rejected with an error.}
091b6dbc 380@item Clauses on @code{end} directive can be on directive @tab Y @tab
d77de738
ML
381@item Deprecation of no-argument @code{destroy} clause on @code{depobj}
382 @tab N @tab
383@item @code{linear} clause syntax changes and @code{step} modifier @tab Y @tab
384@item Deprecation of minus operator for reductions @tab N @tab
385@item Deprecation of separating @code{map} modifiers without comma @tab N @tab
386@item @code{declare mapper} with iterator and @code{present} modifiers
387 @tab N @tab
388@item If a matching mapped list item is not found in the data environment, the
b25ea7ab 389 pointer retains its original value @tab Y @tab
d77de738
ML
390@item New @code{enter} clause as alias for @code{to} on declare target directive
391 @tab Y @tab
392@item Deprecation of @code{to} clause on declare target directive @tab N @tab
393@item Extended list of directives permitted in Fortran pure procedures
2df7e451 394 @tab Y @tab
d77de738
ML
395@item New @code{allocators} directive for Fortran @tab N @tab
396@item Deprecation of @code{allocate} directive for Fortran
397 allocatables/pointers @tab N @tab
398@item Optional paired @code{end} directive with @code{dispatch} @tab N @tab
399@item New @code{memspace} and @code{traits} modifiers for @code{uses_allocators}
400 @tab N @tab
401@item Deprecation of traits array following the allocator_handle expression in
402 @code{uses_allocators} @tab N @tab
403@item New @code{otherwise} clause as alias for @code{default} on metadirectives
404 @tab N @tab
405@item Deprecation of @code{default} clause on metadirectives @tab N @tab
406@item Deprecation of delimited form of @code{declare target} @tab N @tab
407@item Reproducible semantics changed for @code{order(concurrent)} @tab N @tab
408@item @code{allocate} and @code{firstprivate} clauses on @code{scope}
409 @tab Y @tab
410@item @code{ompt_callback_work} @tab N @tab
9f80367e 411@item Default map-type for the @code{map} clause in @code{target enter/exit data}
d77de738
ML
412 @tab Y @tab
413@item New @code{doacross} clause as alias for @code{depend} with
414 @code{source}/@code{sink} modifier @tab Y @tab
415@item Deprecation of @code{depend} with @code{source}/@code{sink} modifier
416 @tab N @tab
417@item @code{omp_cur_iteration} keyword @tab Y @tab
418@end multitable
419
420@unnumberedsubsec Other new OpenMP 5.2 features
421
422@multitable @columnfractions .60 .10 .25
423@headitem Description @tab Status @tab Comments
424@item For Fortran, optional comma between directive and clause @tab N @tab
425@item Conforming device numbers and @code{omp_initial_device} and
426 @code{omp_invalid_device} enum/PARAMETER @tab Y @tab
2cd0689a 427@item Initial value of @var{default-device-var} ICV with
18c8b56c 428 @code{OMP_TARGET_OFFLOAD=mandatory} @tab Y @tab
819f3d36 429@item @code{all} as @emph{implicit-behavior} for @code{defaultmap} @tab N @tab
d77de738
ML
430@item @emph{interop_types} in any position of the modifier list for the @code{init} clause
431 of the @code{interop} construct @tab N @tab
432@end multitable
433
434
c16e85d7
TB
435@node OpenMP Technical Report 11
436@section OpenMP Technical Report 11
437
438Technical Report (TR) 11 is the first preview for OpenMP 6.0.
439
440@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
441@multitable @columnfractions .60 .10 .25
442@item Features deprecated in versions 5.2, 5.1 and 5.0 were removed
443 @tab N/A @tab Backward compatibility
444@item The @code{decl} attribute was added to the C++ attribute syntax
445 @tab N @tab
446@item @code{_ALL} suffix to the device-scope environment variables
447 @tab P @tab Host device number wrongly accepted
448@item For Fortran, @emph{locator list} can be also function reference with
449 data pointer result @tab N @tab
450@item Ref-count change for @code{use_device_ptr}/@code{use_device_addr}
451 @tab N @tab
452@item Implicit reduction identifiers of C++ classes
453 @tab N @tab
454@item Change of the @emph{map-type} property from @emph{ultimate} to
455 @emph{default} @tab N @tab
456@item Concept of @emph{assumed-size arrays} in C and C++
457 @tab N @tab
458@item Mapping of @emph{assumed-size arrays} in C, C++ and Fortran
459 @tab N @tab
460@item @code{groupprivate} directive @tab N @tab
461@item @code{local} clause to declare target directive @tab N @tab
462@item @code{part_size} allocator trait @tab N @tab
463@item @code{pin_device}, @code{preferred_device} and @code{target_access}
464 allocator traits
465 @tab N @tab
466@item @code{access} allocator trait changes @tab N @tab
467@item Extension of @code{interop} operation of @code{append_args}, allowing all
468 modifiers of the @code{init} clause
9f80367e 469 @tab N @tab
c16e85d7
TB
470@item @code{interop} clause to @code{dispatch} @tab N @tab
471@item @code{apply} code to loop-transforming constructs @tab N @tab
472@item @code{omp_curr_progress_width} identifier @tab N @tab
473@item @code{safesync} clause to the @code{parallel} construct @tab N @tab
474@item @code{omp_get_max_progress_width} runtime routine @tab N @tab
8da7476c 475@item @code{strict} modifier keyword to @code{num_threads} @tab N @tab
c16e85d7
TB
476@item @code{memscope} clause to @code{atomic} and @code{flush} @tab N @tab
477@item Routines for obtaining memory spaces/allocators for shared/device memory
478 @tab N @tab
479@item @code{omp_get_memspace_num_resources} routine @tab N @tab
480@item @code{omp_get_submemspace} routine @tab N @tab
481@item @code{ompt_get_buffer_limits} OMPT routine @tab N @tab
482@item Extension of @code{OMP_DEFAULT_DEVICE} and new
483 @code{OMP_AVAILABLE_DEVICES} environment vars @tab N @tab
484@item Supporting increments with abstract names in @code{OMP_PLACES} @tab N @tab
485@end multitable
486
487@unnumberedsubsec Other new TR 11 features
488@multitable @columnfractions .60 .10 .25
489@item Relaxed Fortran restrictions to the @code{aligned} clause @tab N @tab
490@item Mapping lambda captures @tab N @tab
491@item For Fortran, atomic compare with storing the comparison result
492 @tab N @tab
c16e85d7
TB
493@end multitable
494
495
496
d77de738
ML
497@c ---------------------------------------------------------------------
498@c OpenMP Runtime Library Routines
499@c ---------------------------------------------------------------------
500
501@node Runtime Library Routines
502@chapter OpenMP Runtime Library Routines
503
506f068e
TB
504The runtime routines described here are defined by Section 18 of the OpenMP
505specification in version 5.2.
d77de738
ML
506
507@menu
506f068e
TB
508* Thread Team Routines::
509* Thread Affinity Routines::
510* Teams Region Routines::
511* Tasking Routines::
512@c * Resource Relinquishing Routines::
513* Device Information Routines::
514@c * Device Memory Routines::
515* Lock Routines::
516* Timing Routines::
517* Event Routine::
518@c * Interoperability Routines::
519@c * Memory Management Routines::
520@c * Tool Control Routine::
521@c * Environment Display Routine::
522@end menu
d77de738 523
506f068e
TB
524
525
526@node Thread Team Routines
527@section Thread Team Routines
528
529Routines controlling threads in the current contention group.
530They have C linkage and do not throw exceptions.
531
532@menu
533* omp_set_num_threads:: Set upper team size limit
d77de738 534* omp_get_num_threads:: Size of the active team
506f068e 535* omp_get_max_threads:: Maximum number of threads of parallel region
d77de738
ML
536* omp_get_thread_num:: Current thread ID
537* omp_in_parallel:: Whether a parallel region is active
d77de738 538* omp_set_dynamic:: Enable/disable dynamic teams
506f068e
TB
539* omp_get_dynamic:: Dynamic teams setting
540* omp_get_cancellation:: Whether cancellation support is enabled
d77de738 541* omp_set_nested:: Enable/disable nested parallel regions
506f068e 542* omp_get_nested:: Nested parallel regions
d77de738 543* omp_set_schedule:: Set the runtime scheduling method
506f068e
TB
544* omp_get_schedule:: Obtain the runtime scheduling method
545* omp_get_teams_thread_limit:: Maximum number of threads imposed by teams
546* omp_get_supported_active_levels:: Maximum number of active regions supported
547* omp_set_max_active_levels:: Limits the number of active parallel regions
548* omp_get_max_active_levels:: Current maximum number of active regions
549* omp_get_level:: Number of parallel regions
550* omp_get_ancestor_thread_num:: Ancestor thread ID
551* omp_get_team_size:: Number of threads in a team
552* omp_get_active_level:: Number of active parallel regions
553@end menu
d77de738 554
d77de738 555
d77de738 556
506f068e
TB
557@node omp_set_num_threads
558@subsection @code{omp_set_num_threads} -- Set upper team size limit
559@table @asis
560@item @emph{Description}:
561Specifies the number of threads used by default in subsequent parallel
562sections, if those do not specify a @code{num_threads} clause. The
563argument of @code{omp_set_num_threads} shall be a positive integer.
d77de738 564
506f068e
TB
565@item @emph{C/C++}:
566@multitable @columnfractions .20 .80
567@item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
568@end multitable
d77de738 569
506f068e
TB
570@item @emph{Fortran}:
571@multitable @columnfractions .20 .80
572@item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
573@item @tab @code{integer, intent(in) :: num_threads}
574@end multitable
d77de738 575
506f068e
TB
576@item @emph{See also}:
577@ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
d77de738 578
506f068e
TB
579@item @emph{Reference}:
580@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
581@end table
d77de738
ML
582
583
506f068e
TB
584
585@node omp_get_num_threads
586@subsection @code{omp_get_num_threads} -- Size of the active team
d77de738
ML
587@table @asis
588@item @emph{Description}:
506f068e
TB
589Returns the number of threads in the current team. In a sequential section of
590the program @code{omp_get_num_threads} returns 1.
d77de738 591
506f068e
TB
592The default team size may be initialized at startup by the
593@env{OMP_NUM_THREADS} environment variable. At runtime, the size
594of the current team may be set either by the @code{NUM_THREADS}
595clause or by @code{omp_set_num_threads}. If none of the above were
596used to define a specific value and @env{OMP_DYNAMIC} is disabled,
597one thread per CPU online is used.
598
599@item @emph{C/C++}:
d77de738 600@multitable @columnfractions .20 .80
506f068e 601@item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
d77de738
ML
602@end multitable
603
604@item @emph{Fortran}:
605@multitable @columnfractions .20 .80
506f068e 606@item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
d77de738
ML
607@end multitable
608
609@item @emph{See also}:
506f068e 610@ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
d77de738
ML
611
612@item @emph{Reference}:
506f068e 613@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
d77de738
ML
614@end table
615
616
617
506f068e
TB
618@node omp_get_max_threads
619@subsection @code{omp_get_max_threads} -- Maximum number of threads of parallel region
d77de738
ML
620@table @asis
621@item @emph{Description}:
506f068e
TB
622Return the maximum number of threads used for the current parallel region
623that does not use the clause @code{num_threads}.
d77de738 624
506f068e 625@item @emph{C/C++}:
d77de738 626@multitable @columnfractions .20 .80
506f068e 627@item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
d77de738
ML
628@end multitable
629
630@item @emph{Fortran}:
631@multitable @columnfractions .20 .80
506f068e 632@item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
d77de738
ML
633@end multitable
634
635@item @emph{See also}:
506f068e 636@ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
d77de738
ML
637
638@item @emph{Reference}:
506f068e 639@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
d77de738
ML
640@end table
641
642
643
506f068e
TB
644@node omp_get_thread_num
645@subsection @code{omp_get_thread_num} -- Current thread ID
d77de738
ML
646@table @asis
647@item @emph{Description}:
506f068e
TB
648Returns a unique thread identification number within the current team.
649In a sequential parts of the program, @code{omp_get_thread_num}
650always returns 0. In parallel regions the return value varies
651from 0 to @code{omp_get_num_threads}-1 inclusive. The return
652value of the primary thread of a team is always 0.
d77de738
ML
653
654@item @emph{C/C++}:
655@multitable @columnfractions .20 .80
506f068e 656@item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
d77de738
ML
657@end multitable
658
659@item @emph{Fortran}:
660@multitable @columnfractions .20 .80
506f068e 661@item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
d77de738
ML
662@end multitable
663
664@item @emph{See also}:
506f068e 665@ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
d77de738
ML
666
667@item @emph{Reference}:
506f068e 668@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
d77de738
ML
669@end table
670
671
672
506f068e
TB
673@node omp_in_parallel
674@subsection @code{omp_in_parallel} -- Whether a parallel region is active
d77de738
ML
675@table @asis
676@item @emph{Description}:
506f068e
TB
677This function returns @code{true} if currently running in parallel,
678@code{false} otherwise. Here, @code{true} and @code{false} represent
679their language-specific counterparts.
d77de738
ML
680
681@item @emph{C/C++}:
682@multitable @columnfractions .20 .80
506f068e 683@item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
d77de738
ML
684@end multitable
685
686@item @emph{Fortran}:
687@multitable @columnfractions .20 .80
506f068e 688@item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
d77de738
ML
689@end multitable
690
d77de738 691@item @emph{Reference}:
506f068e 692@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
d77de738
ML
693@end table
694
695
506f068e
TB
696@node omp_set_dynamic
697@subsection @code{omp_set_dynamic} -- Enable/disable dynamic teams
d77de738
ML
698@table @asis
699@item @emph{Description}:
506f068e
TB
700Enable or disable the dynamic adjustment of the number of threads
701within a team. The function takes the language-specific equivalent
702of @code{true} and @code{false}, where @code{true} enables dynamic
703adjustment of team sizes and @code{false} disables it.
d77de738 704
506f068e 705@item @emph{C/C++}:
d77de738 706@multitable @columnfractions .20 .80
506f068e 707@item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
d77de738
ML
708@end multitable
709
710@item @emph{Fortran}:
711@multitable @columnfractions .20 .80
506f068e
TB
712@item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
713@item @tab @code{logical, intent(in) :: dynamic_threads}
d77de738
ML
714@end multitable
715
716@item @emph{See also}:
506f068e 717@ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
d77de738
ML
718
719@item @emph{Reference}:
506f068e 720@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
d77de738
ML
721@end table
722
723
724
725@node omp_get_dynamic
506f068e 726@subsection @code{omp_get_dynamic} -- Dynamic teams setting
d77de738
ML
727@table @asis
728@item @emph{Description}:
729This function returns @code{true} if enabled, @code{false} otherwise.
730Here, @code{true} and @code{false} represent their language-specific
731counterparts.
732
733The dynamic team setting may be initialized at startup by the
734@env{OMP_DYNAMIC} environment variable or at runtime using
735@code{omp_set_dynamic}. If undefined, dynamic adjustment is
736disabled by default.
737
738@item @emph{C/C++}:
739@multitable @columnfractions .20 .80
740@item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
741@end multitable
742
743@item @emph{Fortran}:
744@multitable @columnfractions .20 .80
745@item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
746@end multitable
747
748@item @emph{See also}:
749@ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
750
751@item @emph{Reference}:
752@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
753@end table
754
755
756
506f068e
TB
757@node omp_get_cancellation
758@subsection @code{omp_get_cancellation} -- Whether cancellation support is enabled
d77de738
ML
759@table @asis
760@item @emph{Description}:
506f068e
TB
761This function returns @code{true} if cancellation is activated, @code{false}
762otherwise. Here, @code{true} and @code{false} represent their language-specific
763counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are
764deactivated.
d77de738 765
506f068e 766@item @emph{C/C++}:
d77de738 767@multitable @columnfractions .20 .80
506f068e 768@item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
d77de738
ML
769@end multitable
770
771@item @emph{Fortran}:
772@multitable @columnfractions .20 .80
506f068e 773@item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
d77de738
ML
774@end multitable
775
776@item @emph{See also}:
506f068e 777@ref{OMP_CANCELLATION}
d77de738
ML
778
779@item @emph{Reference}:
506f068e 780@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
d77de738
ML
781@end table
782
783
784
506f068e
TB
785@node omp_set_nested
786@subsection @code{omp_set_nested} -- Enable/disable nested parallel regions
d77de738
ML
787@table @asis
788@item @emph{Description}:
506f068e
TB
789Enable or disable nested parallel regions, i.e., whether team members
790are allowed to create new teams. The function takes the language-specific
791equivalent of @code{true} and @code{false}, where @code{true} enables
792dynamic adjustment of team sizes and @code{false} disables it.
d77de738 793
506f068e
TB
794Enabling nested parallel regions will also set the maximum number of
795active nested regions to the maximum supported. Disabling nested parallel
796regions will set the maximum number of active nested regions to one.
797
798Note that the @code{omp_set_nested} API routine was deprecated
799in the OpenMP specification 5.2 in favor of @code{omp_set_max_active_levels}.
800
801@item @emph{C/C++}:
d77de738 802@multitable @columnfractions .20 .80
506f068e 803@item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
d77de738
ML
804@end multitable
805
806@item @emph{Fortran}:
807@multitable @columnfractions .20 .80
506f068e
TB
808@item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
809@item @tab @code{logical, intent(in) :: nested}
d77de738
ML
810@end multitable
811
812@item @emph{See also}:
506f068e
TB
813@ref{omp_get_nested}, @ref{omp_set_max_active_levels},
814@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
d77de738
ML
815
816@item @emph{Reference}:
506f068e 817@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
d77de738
ML
818@end table
819
820
821
506f068e
TB
822@node omp_get_nested
823@subsection @code{omp_get_nested} -- Nested parallel regions
d77de738
ML
824@table @asis
825@item @emph{Description}:
506f068e
TB
826This function returns @code{true} if nested parallel regions are
827enabled, @code{false} otherwise. Here, @code{true} and @code{false}
828represent their language-specific counterparts.
829
830The state of nested parallel regions at startup depends on several
831environment variables. If @env{OMP_MAX_ACTIVE_LEVELS} is defined
832and is set to greater than one, then nested parallel regions will be
833enabled. If not defined, then the value of the @env{OMP_NESTED}
834environment variable will be followed if defined. If neither are
835defined, then if either @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND}
836are defined with a list of more than one value, then nested parallel
837regions are enabled. If none of these are defined, then nested parallel
838regions are disabled by default.
839
840Nested parallel regions can be enabled or disabled at runtime using
841@code{omp_set_nested}, or by setting the maximum number of nested
842regions with @code{omp_set_max_active_levels} to one to disable, or
843above one to enable.
844
845Note that the @code{omp_get_nested} API routine was deprecated
846in the OpenMP specification 5.2 in favor of @code{omp_get_max_active_levels}.
847
848@item @emph{C/C++}:
849@multitable @columnfractions .20 .80
850@item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
851@end multitable
852
853@item @emph{Fortran}:
854@multitable @columnfractions .20 .80
855@item @emph{Interface}: @tab @code{logical function omp_get_nested()}
856@end multitable
857
858@item @emph{See also}:
859@ref{omp_get_max_active_levels}, @ref{omp_set_nested},
860@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
861
862@item @emph{Reference}:
863@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
864@end table
865
866
867
868@node omp_set_schedule
869@subsection @code{omp_set_schedule} -- Set the runtime scheduling method
870@table @asis
871@item @emph{Description}:
872Sets the runtime scheduling method. The @var{kind} argument can have the
873value @code{omp_sched_static}, @code{omp_sched_dynamic},
874@code{omp_sched_guided} or @code{omp_sched_auto}. Except for
875@code{omp_sched_auto}, the chunk size is set to the value of
876@var{chunk_size} if positive, or to the default value if zero or negative.
877For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
d77de738
ML
878
879@item @emph{C/C++}
880@multitable @columnfractions .20 .80
506f068e 881@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
d77de738
ML
882@end multitable
883
884@item @emph{Fortran}:
885@multitable @columnfractions .20 .80
506f068e
TB
886@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
887@item @tab @code{integer(kind=omp_sched_kind) kind}
888@item @tab @code{integer chunk_size}
d77de738
ML
889@end multitable
890
891@item @emph{See also}:
506f068e
TB
892@ref{omp_get_schedule}
893@ref{OMP_SCHEDULE}
d77de738
ML
894
895@item @emph{Reference}:
506f068e 896@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
d77de738
ML
897@end table
898
899
506f068e
TB
900
901@node omp_get_schedule
902@subsection @code{omp_get_schedule} -- Obtain the runtime scheduling method
d77de738
ML
903@table @asis
904@item @emph{Description}:
506f068e
TB
905Obtain the runtime scheduling method. The @var{kind} argument will be
906set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
907@code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
908@var{chunk_size}, is set to the chunk size.
d77de738
ML
909
910@item @emph{C/C++}
911@multitable @columnfractions .20 .80
506f068e 912@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
d77de738
ML
913@end multitable
914
915@item @emph{Fortran}:
916@multitable @columnfractions .20 .80
506f068e
TB
917@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
918@item @tab @code{integer(kind=omp_sched_kind) kind}
919@item @tab @code{integer chunk_size}
d77de738
ML
920@end multitable
921
506f068e
TB
922@item @emph{See also}:
923@ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
924
d77de738 925@item @emph{Reference}:
506f068e 926@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
d77de738
ML
927@end table
928
929
506f068e
TB
930@node omp_get_teams_thread_limit
931@subsection @code{omp_get_teams_thread_limit} -- Maximum number of threads imposed by teams
d77de738
ML
932@table @asis
933@item @emph{Description}:
506f068e
TB
934Return the maximum number of threads that will be able to participate in
935each team created by a teams construct.
d77de738
ML
936
937@item @emph{C/C++}:
938@multitable @columnfractions .20 .80
506f068e 939@item @emph{Prototype}: @tab @code{int omp_get_teams_thread_limit(void);}
d77de738
ML
940@end multitable
941
942@item @emph{Fortran}:
943@multitable @columnfractions .20 .80
506f068e 944@item @emph{Interface}: @tab @code{integer function omp_get_teams_thread_limit()}
d77de738
ML
945@end multitable
946
947@item @emph{See also}:
506f068e 948@ref{omp_set_teams_thread_limit}, @ref{OMP_TEAMS_THREAD_LIMIT}
d77de738
ML
949
950@item @emph{Reference}:
506f068e 951@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.6.
d77de738
ML
952@end table
953
954
955
506f068e
TB
956@node omp_get_supported_active_levels
957@subsection @code{omp_get_supported_active_levels} -- Maximum number of active regions supported
d77de738
ML
958@table @asis
959@item @emph{Description}:
506f068e
TB
960This function returns the maximum number of nested, active parallel regions
961supported by this implementation.
d77de738 962
506f068e 963@item @emph{C/C++}
d77de738 964@multitable @columnfractions .20 .80
506f068e 965@item @emph{Prototype}: @tab @code{int omp_get_supported_active_levels(void);}
d77de738
ML
966@end multitable
967
968@item @emph{Fortran}:
969@multitable @columnfractions .20 .80
506f068e 970@item @emph{Interface}: @tab @code{integer function omp_get_supported_active_levels()}
d77de738
ML
971@end multitable
972
973@item @emph{See also}:
506f068e 974@ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
d77de738
ML
975
976@item @emph{Reference}:
506f068e 977@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.15.
d77de738
ML
978@end table
979
980
981
506f068e
TB
982@node omp_set_max_active_levels
983@subsection @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
d77de738
ML
984@table @asis
985@item @emph{Description}:
506f068e
TB
986This function limits the maximum allowed number of nested, active
987parallel regions. @var{max_levels} must be less or equal to
988the value returned by @code{omp_get_supported_active_levels}.
d77de738 989
506f068e
TB
990@item @emph{C/C++}
991@multitable @columnfractions .20 .80
992@item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
993@end multitable
d77de738 994
506f068e
TB
995@item @emph{Fortran}:
996@multitable @columnfractions .20 .80
997@item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
998@item @tab @code{integer max_levels}
999@end multitable
d77de738 1000
506f068e
TB
1001@item @emph{See also}:
1002@ref{omp_get_max_active_levels}, @ref{omp_get_active_level},
1003@ref{omp_get_supported_active_levels}
2cd0689a 1004
506f068e
TB
1005@item @emph{Reference}:
1006@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
1007@end table
1008
1009
1010
1011@node omp_get_max_active_levels
1012@subsection @code{omp_get_max_active_levels} -- Current maximum number of active regions
1013@table @asis
1014@item @emph{Description}:
1015This function obtains the maximum allowed number of nested, active parallel regions.
1016
1017@item @emph{C/C++}
d77de738 1018@multitable @columnfractions .20 .80
506f068e 1019@item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
d77de738
ML
1020@end multitable
1021
1022@item @emph{Fortran}:
1023@multitable @columnfractions .20 .80
506f068e 1024@item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
d77de738
ML
1025@end multitable
1026
1027@item @emph{See also}:
506f068e 1028@ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
d77de738
ML
1029
1030@item @emph{Reference}:
506f068e 1031@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
d77de738
ML
1032@end table
1033
1034
506f068e
TB
1035@node omp_get_level
1036@subsection @code{omp_get_level} -- Obtain the current nesting level
d77de738
ML
1037@table @asis
1038@item @emph{Description}:
506f068e
TB
1039This function returns the nesting level for the parallel blocks,
1040which enclose the calling call.
d77de738 1041
506f068e 1042@item @emph{C/C++}
d77de738 1043@multitable @columnfractions .20 .80
506f068e 1044@item @emph{Prototype}: @tab @code{int omp_get_level(void);}
d77de738
ML
1045@end multitable
1046
1047@item @emph{Fortran}:
1048@multitable @columnfractions .20 .80
506f068e 1049@item @emph{Interface}: @tab @code{integer function omp_level()}
d77de738
ML
1050@end multitable
1051
506f068e
TB
1052@item @emph{See also}:
1053@ref{omp_get_active_level}
1054
d77de738 1055@item @emph{Reference}:
506f068e 1056@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
d77de738
ML
1057@end table
1058
1059
1060
506f068e
TB
1061@node omp_get_ancestor_thread_num
1062@subsection @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
d77de738
ML
1063@table @asis
1064@item @emph{Description}:
506f068e
TB
1065This function returns the thread identification number for the given
1066nesting level of the current thread. For values of @var{level} outside
1067zero to @code{omp_get_level} -1 is returned; if @var{level} is
1068@code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
d77de738 1069
506f068e 1070@item @emph{C/C++}
d77de738 1071@multitable @columnfractions .20 .80
506f068e 1072@item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
d77de738
ML
1073@end multitable
1074
1075@item @emph{Fortran}:
1076@multitable @columnfractions .20 .80
506f068e
TB
1077@item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
1078@item @tab @code{integer level}
d77de738
ML
1079@end multitable
1080
506f068e
TB
1081@item @emph{See also}:
1082@ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
1083
d77de738 1084@item @emph{Reference}:
506f068e 1085@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
d77de738
ML
1086@end table
1087
1088
1089
506f068e
TB
1090@node omp_get_team_size
1091@subsection @code{omp_get_team_size} -- Number of threads in a team
d77de738
ML
1092@table @asis
1093@item @emph{Description}:
506f068e
TB
1094This function returns the number of threads in a thread team to which
1095either the current thread or its ancestor belongs. For values of @var{level}
1096outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
10971 is returned, and for @code{omp_get_level}, the result is identical
1098to @code{omp_get_num_threads}.
d77de738
ML
1099
1100@item @emph{C/C++}:
1101@multitable @columnfractions .20 .80
506f068e 1102@item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
d77de738
ML
1103@end multitable
1104
1105@item @emph{Fortran}:
1106@multitable @columnfractions .20 .80
506f068e
TB
1107@item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
1108@item @tab @code{integer level}
d77de738
ML
1109@end multitable
1110
506f068e
TB
1111@item @emph{See also}:
1112@ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
1113
d77de738 1114@item @emph{Reference}:
506f068e 1115@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
d77de738
ML
1116@end table
1117
1118
1119
506f068e
TB
1120@node omp_get_active_level
1121@subsection @code{omp_get_active_level} -- Number of parallel regions
d77de738
ML
1122@table @asis
1123@item @emph{Description}:
506f068e
TB
1124This function returns the nesting level for the active parallel blocks,
1125which enclose the calling call.
d77de738 1126
506f068e 1127@item @emph{C/C++}
d77de738 1128@multitable @columnfractions .20 .80
506f068e 1129@item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
d77de738
ML
1130@end multitable
1131
1132@item @emph{Fortran}:
1133@multitable @columnfractions .20 .80
506f068e 1134@item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
d77de738
ML
1135@end multitable
1136
1137@item @emph{See also}:
506f068e 1138@ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
d77de738
ML
1139
1140@item @emph{Reference}:
506f068e 1141@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
d77de738
ML
1142@end table
1143
1144
1145
506f068e
TB
1146@node Thread Affinity Routines
1147@section Thread Affinity Routines
1148
1149Routines controlling and accessing thread-affinity policies.
1150They have C linkage and do not throw exceptions.
1151
1152@menu
1153* omp_get_proc_bind:: Whether threads may be moved between CPUs
1154@c * omp_get_num_places:: <fixme>
1155@c * omp_get_place_num_procs:: <fixme>
1156@c * omp_get_place_proc_ids:: <fixme>
1157@c * omp_get_place_num:: <fixme>
1158@c * omp_get_partition_num_places:: <fixme>
1159@c * omp_get_partition_place_nums:: <fixme>
1160@c * omp_set_affinity_format:: <fixme>
1161@c * omp_get_affinity_format:: <fixme>
1162@c * omp_display_affinity:: <fixme>
1163@c * omp_capture_affinity:: <fixme>
1164@end menu
1165
1166
1167
d77de738 1168@node omp_get_proc_bind
506f068e 1169@subsection @code{omp_get_proc_bind} -- Whether threads may be moved between CPUs
d77de738
ML
1170@table @asis
1171@item @emph{Description}:
1172This functions returns the currently active thread affinity policy, which is
1173set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false},
1174@code{omp_proc_bind_true}, @code{omp_proc_bind_primary},
1175@code{omp_proc_bind_master}, @code{omp_proc_bind_close} and @code{omp_proc_bind_spread},
1176where @code{omp_proc_bind_master} is an alias for @code{omp_proc_bind_primary}.
1177
1178@item @emph{C/C++}:
1179@multitable @columnfractions .20 .80
1180@item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
1181@end multitable
1182
1183@item @emph{Fortran}:
1184@multitable @columnfractions .20 .80
1185@item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
1186@end multitable
1187
1188@item @emph{See also}:
1189@ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
1190
1191@item @emph{Reference}:
1192@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
1193@end table
1194
1195
1196
506f068e
TB
1197@node Teams Region Routines
1198@section Teams Region Routines
d77de738 1199
506f068e
TB
1200Routines controlling the league of teams that are executed in a @code{teams}
1201region. They have C linkage and do not throw exceptions.
d77de738 1202
506f068e
TB
1203@menu
1204* omp_get_num_teams:: Number of teams
1205* omp_get_team_num:: Get team number
1206* omp_set_num_teams:: Set upper teams limit for teams region
1207* omp_get_max_teams:: Maximum number of teams for teams region
1208* omp_set_teams_thread_limit:: Set upper thread limit for teams construct
1209* omp_get_thread_limit:: Maximum number of threads
1210@end menu
d77de738 1211
d77de738
ML
1212
1213
506f068e
TB
1214@node omp_get_num_teams
1215@subsection @code{omp_get_num_teams} -- Number of teams
d77de738
ML
1216@table @asis
1217@item @emph{Description}:
506f068e 1218Returns the number of teams in the current team region.
d77de738 1219
506f068e 1220@item @emph{C/C++}:
d77de738 1221@multitable @columnfractions .20 .80
506f068e 1222@item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
d77de738
ML
1223@end multitable
1224
1225@item @emph{Fortran}:
1226@multitable @columnfractions .20 .80
506f068e 1227@item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
d77de738
ML
1228@end multitable
1229
d77de738 1230@item @emph{Reference}:
506f068e 1231@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
d77de738
ML
1232@end table
1233
1234
1235
1236@node omp_get_team_num
506f068e 1237@subsection @code{omp_get_team_num} -- Get team number
d77de738
ML
1238@table @asis
1239@item @emph{Description}:
1240Returns the team number of the calling thread.
1241
1242@item @emph{C/C++}:
1243@multitable @columnfractions .20 .80
1244@item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
1245@end multitable
1246
1247@item @emph{Fortran}:
1248@multitable @columnfractions .20 .80
1249@item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
1250@end multitable
1251
1252@item @emph{Reference}:
1253@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
1254@end table
1255
1256
1257
506f068e
TB
1258@node omp_set_num_teams
1259@subsection @code{omp_set_num_teams} -- Set upper teams limit for teams construct
d77de738
ML
1260@table @asis
1261@item @emph{Description}:
506f068e
TB
1262Specifies the upper bound for number of teams created by the teams construct
1263which does not specify a @code{num_teams} clause. The
1264argument of @code{omp_set_num_teams} shall be a positive integer.
d77de738
ML
1265
1266@item @emph{C/C++}:
1267@multitable @columnfractions .20 .80
506f068e 1268@item @emph{Prototype}: @tab @code{void omp_set_num_teams(int num_teams);}
d77de738
ML
1269@end multitable
1270
1271@item @emph{Fortran}:
1272@multitable @columnfractions .20 .80
506f068e
TB
1273@item @emph{Interface}: @tab @code{subroutine omp_set_num_teams(num_teams)}
1274@item @tab @code{integer, intent(in) :: num_teams}
d77de738
ML
1275@end multitable
1276
1277@item @emph{See also}:
506f068e 1278@ref{OMP_NUM_TEAMS}, @ref{omp_get_num_teams}, @ref{omp_get_max_teams}
d77de738
ML
1279
1280@item @emph{Reference}:
506f068e 1281@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.3.
d77de738
ML
1282@end table
1283
1284
1285
506f068e
TB
1286@node omp_get_max_teams
1287@subsection @code{omp_get_max_teams} -- Maximum number of teams of teams region
d77de738
ML
1288@table @asis
1289@item @emph{Description}:
506f068e
TB
1290Return the maximum number of teams used for the teams region
1291that does not use the clause @code{num_teams}.
d77de738
ML
1292
1293@item @emph{C/C++}:
1294@multitable @columnfractions .20 .80
506f068e 1295@item @emph{Prototype}: @tab @code{int omp_get_max_teams(void);}
d77de738
ML
1296@end multitable
1297
1298@item @emph{Fortran}:
1299@multitable @columnfractions .20 .80
506f068e 1300@item @emph{Interface}: @tab @code{integer function omp_get_max_teams()}
d77de738
ML
1301@end multitable
1302
1303@item @emph{See also}:
506f068e 1304@ref{omp_set_num_teams}, @ref{omp_get_num_teams}
d77de738
ML
1305
1306@item @emph{Reference}:
506f068e 1307@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.4.
d77de738
ML
1308@end table
1309
1310
1311
506f068e
TB
1312@node omp_set_teams_thread_limit
1313@subsection @code{omp_set_teams_thread_limit} -- Set upper thread limit for teams construct
d77de738
ML
1314@table @asis
1315@item @emph{Description}:
506f068e
TB
1316Specifies the upper bound for number of threads that will be available
1317for each team created by the teams construct which does not specify a
1318@code{thread_limit} clause. The argument of
1319@code{omp_set_teams_thread_limit} shall be a positive integer.
d77de738
ML
1320
1321@item @emph{C/C++}:
1322@multitable @columnfractions .20 .80
506f068e 1323@item @emph{Prototype}: @tab @code{void omp_set_teams_thread_limit(int thread_limit);}
d77de738
ML
1324@end multitable
1325
1326@item @emph{Fortran}:
1327@multitable @columnfractions .20 .80
506f068e
TB
1328@item @emph{Interface}: @tab @code{subroutine omp_set_teams_thread_limit(thread_limit)}
1329@item @tab @code{integer, intent(in) :: thread_limit}
d77de738
ML
1330@end multitable
1331
1332@item @emph{See also}:
506f068e 1333@ref{OMP_TEAMS_THREAD_LIMIT}, @ref{omp_get_teams_thread_limit}, @ref{omp_get_thread_limit}
d77de738
ML
1334
1335@item @emph{Reference}:
506f068e 1336@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.5.
d77de738
ML
1337@end table
1338
1339
1340
506f068e
TB
1341@node omp_get_thread_limit
1342@subsection @code{omp_get_thread_limit} -- Maximum number of threads
d77de738
ML
1343@table @asis
1344@item @emph{Description}:
506f068e 1345Return the maximum number of threads of the program.
d77de738
ML
1346
1347@item @emph{C/C++}:
1348@multitable @columnfractions .20 .80
506f068e 1349@item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
d77de738
ML
1350@end multitable
1351
1352@item @emph{Fortran}:
1353@multitable @columnfractions .20 .80
506f068e 1354@item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
d77de738
ML
1355@end multitable
1356
1357@item @emph{See also}:
506f068e 1358@ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
d77de738
ML
1359
1360@item @emph{Reference}:
506f068e 1361@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
d77de738
ML
1362@end table
1363
1364
1365
506f068e
TB
1366@node Tasking Routines
1367@section Tasking Routines
1368
1369Routines relating to explicit tasks.
1370They have C linkage and do not throw exceptions.
1371
1372@menu
1373* omp_get_max_task_priority:: Maximum task priority value that can be set
819f3d36 1374* omp_in_explicit_task:: Whether a given task is an explicit task
506f068e
TB
1375* omp_in_final:: Whether in final or included task region
1376@end menu
1377
1378
1379
1380@node omp_get_max_task_priority
1381@subsection @code{omp_get_max_task_priority} -- Maximum priority value
1382that can be set for tasks.
d77de738
ML
1383@table @asis
1384@item @emph{Description}:
506f068e 1385This function obtains the maximum allowed priority number for tasks.
d77de738 1386
506f068e 1387@item @emph{C/C++}
d77de738 1388@multitable @columnfractions .20 .80
506f068e 1389@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
d77de738
ML
1390@end multitable
1391
1392@item @emph{Fortran}:
1393@multitable @columnfractions .20 .80
506f068e 1394@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
d77de738
ML
1395@end multitable
1396
1397@item @emph{Reference}:
506f068e 1398@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
d77de738
ML
1399@end table
1400
1401
506f068e 1402
819f3d36
TB
1403@node omp_in_explicit_task
1404@subsection @code{omp_in_explicit_task} -- Whether a given task is an explicit task
1405@table @asis
1406@item @emph{Description}:
1407The function returns the @var{explicit-task-var} ICV; it returns true when the
1408encountering task was generated by a task-generating construct such as
1409@code{target}, @code{task} or @code{taskloop}. Otherwise, the encountering task
1410is in an implicit task region such as generated by the implicit or explicit
1411@code{parallel} region and @code{omp_in_explicit_task} returns false.
1412
1413@item @emph{C/C++}
1414@multitable @columnfractions .20 .80
1415@item @emph{Prototype}: @tab @code{int omp_in_explicit_task(void);}
1416@end multitable
1417
1418@item @emph{Fortran}:
1419@multitable @columnfractions .20 .80
1420@item @emph{Interface}: @tab @code{logical function omp_in_explicit_task()}
1421@end multitable
1422
1423@item @emph{Reference}:
1424@uref{https://www.openmp.org, OpenMP specification v5.2}, Section 18.5.2.
1425@end table
1426
1427
1428
d77de738 1429@node omp_in_final
506f068e 1430@subsection @code{omp_in_final} -- Whether in final or included task region
d77de738
ML
1431@table @asis
1432@item @emph{Description}:
1433This function returns @code{true} if currently running in a final
1434or included task region, @code{false} otherwise. Here, @code{true}
1435and @code{false} represent their language-specific counterparts.
1436
1437@item @emph{C/C++}:
1438@multitable @columnfractions .20 .80
1439@item @emph{Prototype}: @tab @code{int omp_in_final(void);}
1440@end multitable
1441
1442@item @emph{Fortran}:
1443@multitable @columnfractions .20 .80
1444@item @emph{Interface}: @tab @code{logical function omp_in_final()}
1445@end multitable
1446
1447@item @emph{Reference}:
1448@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
1449@end table
1450
1451
1452
506f068e
TB
1453@c @node Resource Relinquishing Routines
1454@c @section Resource Relinquishing Routines
1455@c
1456@c Routines releasing resources used by the OpenMP runtime.
1457@c They have C linkage and do not throw exceptions.
1458@c
1459@c @menu
1460@c * omp_pause_resource:: <fixme>
1461@c * omp_pause_resource_all:: <fixme>
1462@c @end menu
1463
1464@node Device Information Routines
1465@section Device Information Routines
1466
1467Routines related to devices available to an OpenMP program.
1468They have C linkage and do not throw exceptions.
1469
1470@menu
1471* omp_get_num_procs:: Number of processors online
1472@c * omp_get_max_progress_width:: <fixme>/TR11
1473* omp_set_default_device:: Set the default device for target regions
1474* omp_get_default_device:: Get the default device for target regions
1475* omp_get_num_devices:: Number of target devices
1476* omp_get_device_num:: Get device that current thread is running on
1477* omp_is_initial_device:: Whether executing on the host device
1478* omp_get_initial_device:: Device number of host device
1479@end menu
1480
1481
1482
1483@node omp_get_num_procs
1484@subsection @code{omp_get_num_procs} -- Number of processors online
d77de738
ML
1485@table @asis
1486@item @emph{Description}:
506f068e 1487Returns the number of processors online on that device.
d77de738
ML
1488
1489@item @emph{C/C++}:
1490@multitable @columnfractions .20 .80
506f068e 1491@item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
d77de738
ML
1492@end multitable
1493
1494@item @emph{Fortran}:
1495@multitable @columnfractions .20 .80
506f068e 1496@item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
d77de738
ML
1497@end multitable
1498
1499@item @emph{Reference}:
506f068e 1500@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
d77de738
ML
1501@end table
1502
1503
1504
1505@node omp_set_default_device
506f068e 1506@subsection @code{omp_set_default_device} -- Set the default device for target regions
d77de738
ML
1507@table @asis
1508@item @emph{Description}:
1509Set the default device for target regions without device clause. The argument
1510shall be a nonnegative device number.
1511
1512@item @emph{C/C++}:
1513@multitable @columnfractions .20 .80
1514@item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
1515@end multitable
1516
1517@item @emph{Fortran}:
1518@multitable @columnfractions .20 .80
1519@item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
1520@item @tab @code{integer device_num}
1521@end multitable
1522
1523@item @emph{See also}:
1524@ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
1525
1526@item @emph{Reference}:
1527@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
1528@end table
1529
1530
1531
506f068e
TB
1532@node omp_get_default_device
1533@subsection @code{omp_get_default_device} -- Get the default device for target regions
d77de738
ML
1534@table @asis
1535@item @emph{Description}:
506f068e 1536Get the default device for target regions without device clause.
2cd0689a 1537
d77de738
ML
1538@item @emph{C/C++}:
1539@multitable @columnfractions .20 .80
506f068e 1540@item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
d77de738
ML
1541@end multitable
1542
1543@item @emph{Fortran}:
1544@multitable @columnfractions .20 .80
506f068e 1545@item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
d77de738
ML
1546@end multitable
1547
1548@item @emph{See also}:
506f068e 1549@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
d77de738
ML
1550
1551@item @emph{Reference}:
506f068e 1552@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
d77de738
ML
1553@end table
1554
1555
1556
506f068e
TB
1557@node omp_get_num_devices
1558@subsection @code{omp_get_num_devices} -- Number of target devices
d77de738
ML
1559@table @asis
1560@item @emph{Description}:
506f068e 1561Returns the number of target devices.
d77de738
ML
1562
1563@item @emph{C/C++}:
1564@multitable @columnfractions .20 .80
506f068e 1565@item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
d77de738
ML
1566@end multitable
1567
1568@item @emph{Fortran}:
1569@multitable @columnfractions .20 .80
506f068e 1570@item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
d77de738
ML
1571@end multitable
1572
d77de738 1573@item @emph{Reference}:
506f068e 1574@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
d77de738
ML
1575@end table
1576
1577
1578
506f068e
TB
1579@node omp_get_device_num
1580@subsection @code{omp_get_device_num} -- Return device number of current device
d77de738
ML
1581@table @asis
1582@item @emph{Description}:
506f068e
TB
1583This function returns a device number that represents the device that the
1584current thread is executing on. For OpenMP 5.0, this must be equal to the
1585value returned by the @code{omp_get_initial_device} function when called
1586from the host.
d77de738 1587
506f068e 1588@item @emph{C/C++}
d77de738 1589@multitable @columnfractions .20 .80
506f068e 1590@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
d77de738
ML
1591@end multitable
1592
1593@item @emph{Fortran}:
506f068e
TB
1594@multitable @columnfractions .20 .80
1595@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
d77de738
ML
1596@end multitable
1597
1598@item @emph{See also}:
506f068e 1599@ref{omp_get_initial_device}
d77de738
ML
1600
1601@item @emph{Reference}:
506f068e 1602@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
d77de738
ML
1603@end table
1604
1605
1606
506f068e
TB
1607@node omp_is_initial_device
1608@subsection @code{omp_is_initial_device} -- Whether executing on the host device
d77de738
ML
1609@table @asis
1610@item @emph{Description}:
506f068e
TB
1611This function returns @code{true} if currently running on the host device,
1612@code{false} otherwise. Here, @code{true} and @code{false} represent
1613their language-specific counterparts.
d77de738 1614
506f068e 1615@item @emph{C/C++}:
d77de738 1616@multitable @columnfractions .20 .80
506f068e 1617@item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
d77de738
ML
1618@end multitable
1619
1620@item @emph{Fortran}:
1621@multitable @columnfractions .20 .80
506f068e 1622@item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
d77de738
ML
1623@end multitable
1624
d77de738 1625@item @emph{Reference}:
506f068e 1626@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
d77de738
ML
1627@end table
1628
1629
1630
506f068e
TB
1631@node omp_get_initial_device
1632@subsection @code{omp_get_initial_device} -- Return device number of initial device
d77de738
ML
1633@table @asis
1634@item @emph{Description}:
506f068e
TB
1635This function returns a device number that represents the host device.
1636For OpenMP 5.1, this must be equal to the value returned by the
1637@code{omp_get_num_devices} function.
d77de738 1638
506f068e 1639@item @emph{C/C++}
d77de738 1640@multitable @columnfractions .20 .80
506f068e 1641@item @emph{Prototype}: @tab @code{int omp_get_initial_device(void);}
d77de738
ML
1642@end multitable
1643
1644@item @emph{Fortran}:
1645@multitable @columnfractions .20 .80
506f068e 1646@item @emph{Interface}: @tab @code{integer function omp_get_initial_device()}
d77de738
ML
1647@end multitable
1648
1649@item @emph{See also}:
506f068e 1650@ref{omp_get_num_devices}
d77de738
ML
1651
1652@item @emph{Reference}:
506f068e 1653@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.35.
d77de738
ML
1654@end table
1655
1656
1657
506f068e
TB
1658@c @node Device Memory Routines
1659@c @section Device Memory Routines
1660@c
1661@c Routines related to memory allocation and managing corresponding
1662@c pointers on devices. They have C linkage and do not throw exceptions.
1663@c
1664@c @menu
1665@c * omp_target_alloc:: <fixme>
1666@c * omp_target_free:: <fixme>
1667@c * omp_target_is_present:: <fixme>
1668@c * omp_target_is_accessible:: <fixme>
1669@c * omp_target_memcpy:: <fixme>
1670@c * omp_target_memcpy_rect:: <fixme>
1671@c * omp_target_memcpy_async:: <fixme>
1672@c * omp_target_memcpy_rect_async:: <fixme>
1673@c * omp_target_associate_ptr:: <fixme>
1674@c * omp_target_disassociate_ptr:: <fixme>
1675@c * omp_get_mapped_ptr:: <fixme>
1676@c @end menu
1677
1678@node Lock Routines
1679@section Lock Routines
1680
1681Initialize, set, test, unset and destroy simple and nested locks.
1682The routines have C linkage and do not throw exceptions.
1683
1684@menu
1685* omp_init_lock:: Initialize simple lock
1686* omp_init_nest_lock:: Initialize nested lock
1687@c * omp_init_lock_with_hint:: <fixme>
1688@c * omp_init_nest_lock_with_hint:: <fixme>
1689* omp_destroy_lock:: Destroy simple lock
1690* omp_destroy_nest_lock:: Destroy nested lock
1691* omp_set_lock:: Wait for and set simple lock
1692* omp_set_nest_lock:: Wait for and set simple lock
1693* omp_unset_lock:: Unset simple lock
1694* omp_unset_nest_lock:: Unset nested lock
1695* omp_test_lock:: Test and set simple lock if available
1696* omp_test_nest_lock:: Test and set nested lock if available
1697@end menu
1698
1699
1700
d77de738 1701@node omp_init_lock
506f068e 1702@subsection @code{omp_init_lock} -- Initialize simple lock
d77de738
ML
1703@table @asis
1704@item @emph{Description}:
1705Initialize a simple lock. After initialization, the lock is in
1706an unlocked state.
1707
1708@item @emph{C/C++}:
1709@multitable @columnfractions .20 .80
1710@item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1711@end multitable
1712
1713@item @emph{Fortran}:
1714@multitable @columnfractions .20 .80
1715@item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1716@item @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1717@end multitable
1718
1719@item @emph{See also}:
1720@ref{omp_destroy_lock}
1721
1722@item @emph{Reference}:
1723@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1724@end table
1725
1726
1727
506f068e
TB
1728@node omp_init_nest_lock
1729@subsection @code{omp_init_nest_lock} -- Initialize nested lock
d77de738
ML
1730@table @asis
1731@item @emph{Description}:
506f068e
TB
1732Initialize a nested lock. After initialization, the lock is in
1733an unlocked state and the nesting count is set to zero.
d77de738
ML
1734
1735@item @emph{C/C++}:
1736@multitable @columnfractions .20 .80
506f068e 1737@item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
d77de738
ML
1738@end multitable
1739
1740@item @emph{Fortran}:
1741@multitable @columnfractions .20 .80
506f068e
TB
1742@item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1743@item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
d77de738
ML
1744@end multitable
1745
1746@item @emph{See also}:
506f068e 1747@ref{omp_destroy_nest_lock}
d77de738 1748
506f068e
TB
1749@item @emph{Reference}:
1750@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
d77de738
ML
1751@end table
1752
1753
1754
506f068e
TB
1755@node omp_destroy_lock
1756@subsection @code{omp_destroy_lock} -- Destroy simple lock
d77de738
ML
1757@table @asis
1758@item @emph{Description}:
506f068e
TB
1759Destroy a simple lock. In order to be destroyed, a simple lock must be
1760in the unlocked state.
d77de738
ML
1761
1762@item @emph{C/C++}:
1763@multitable @columnfractions .20 .80
506f068e 1764@item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
d77de738
ML
1765@end multitable
1766
1767@item @emph{Fortran}:
1768@multitable @columnfractions .20 .80
506f068e 1769@item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
d77de738
ML
1770@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1771@end multitable
1772
1773@item @emph{See also}:
506f068e 1774@ref{omp_init_lock}
d77de738
ML
1775
1776@item @emph{Reference}:
506f068e 1777@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
d77de738
ML
1778@end table
1779
1780
1781
506f068e
TB
1782@node omp_destroy_nest_lock
1783@subsection @code{omp_destroy_nest_lock} -- Destroy nested lock
d77de738
ML
1784@table @asis
1785@item @emph{Description}:
506f068e
TB
1786Destroy a nested lock. In order to be destroyed, a nested lock must be
1787in the unlocked state and its nesting count must equal zero.
d77de738
ML
1788
1789@item @emph{C/C++}:
1790@multitable @columnfractions .20 .80
506f068e 1791@item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
d77de738
ML
1792@end multitable
1793
1794@item @emph{Fortran}:
1795@multitable @columnfractions .20 .80
506f068e
TB
1796@item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1797@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
d77de738
ML
1798@end multitable
1799
1800@item @emph{See also}:
506f068e 1801@ref{omp_init_lock}
d77de738
ML
1802
1803@item @emph{Reference}:
506f068e 1804@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
d77de738
ML
1805@end table
1806
1807
1808
506f068e
TB
1809@node omp_set_lock
1810@subsection @code{omp_set_lock} -- Wait for and set simple lock
d77de738
ML
1811@table @asis
1812@item @emph{Description}:
506f068e
TB
1813Before setting a simple lock, the lock variable must be initialized by
1814@code{omp_init_lock}. The calling thread is blocked until the lock
1815is available. If the lock is already held by the current thread,
1816a deadlock occurs.
d77de738
ML
1817
1818@item @emph{C/C++}:
1819@multitable @columnfractions .20 .80
506f068e 1820@item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
d77de738
ML
1821@end multitable
1822
1823@item @emph{Fortran}:
1824@multitable @columnfractions .20 .80
506f068e 1825@item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
d77de738
ML
1826@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1827@end multitable
1828
1829@item @emph{See also}:
506f068e 1830@ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
d77de738
ML
1831
1832@item @emph{Reference}:
506f068e 1833@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
d77de738
ML
1834@end table
1835
1836
1837
d77de738 1838@node omp_set_nest_lock
506f068e 1839@subsection @code{omp_set_nest_lock} -- Wait for and set nested lock
d77de738
ML
1840@table @asis
1841@item @emph{Description}:
1842Before setting a nested lock, the lock variable must be initialized by
1843@code{omp_init_nest_lock}. The calling thread is blocked until the lock
1844is available. If the lock is already held by the current thread, the
1845nesting count for the lock is incremented.
1846
1847@item @emph{C/C++}:
1848@multitable @columnfractions .20 .80
1849@item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1850@end multitable
1851
1852@item @emph{Fortran}:
1853@multitable @columnfractions .20 .80
1854@item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1855@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1856@end multitable
1857
1858@item @emph{See also}:
1859@ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1860
1861@item @emph{Reference}:
1862@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1863@end table
1864
1865
1866
506f068e
TB
1867@node omp_unset_lock
1868@subsection @code{omp_unset_lock} -- Unset simple lock
d77de738
ML
1869@table @asis
1870@item @emph{Description}:
506f068e
TB
1871A simple lock about to be unset must have been locked by @code{omp_set_lock}
1872or @code{omp_test_lock} before. In addition, the lock must be held by the
1873thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one
1874or more threads attempted to set the lock before, one of them is chosen to,
1875again, set the lock to itself.
d77de738
ML
1876
1877@item @emph{C/C++}:
1878@multitable @columnfractions .20 .80
506f068e 1879@item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
d77de738
ML
1880@end multitable
1881
1882@item @emph{Fortran}:
1883@multitable @columnfractions .20 .80
506f068e
TB
1884@item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1885@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
d77de738
ML
1886@end multitable
1887
d77de738 1888@item @emph{See also}:
506f068e 1889@ref{omp_set_lock}, @ref{omp_test_lock}
d77de738
ML
1890
1891@item @emph{Reference}:
506f068e 1892@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
d77de738
ML
1893@end table
1894
1895
1896
1897@node omp_unset_nest_lock
506f068e 1898@subsection @code{omp_unset_nest_lock} -- Unset nested lock
d77de738
ML
1899@table @asis
1900@item @emph{Description}:
1901A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1902or @code{omp_test_nested_lock} before. In addition, the lock must be held by the
1903thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the
1904lock becomes unlocked. If one ore more threads attempted to set the lock before,
1905one of them is chosen to, again, set the lock to itself.
1906
1907@item @emph{C/C++}:
1908@multitable @columnfractions .20 .80
1909@item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1910@end multitable
1911
1912@item @emph{Fortran}:
1913@multitable @columnfractions .20 .80
1914@item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1915@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1916@end multitable
1917
1918@item @emph{See also}:
1919@ref{omp_set_nest_lock}
1920
1921@item @emph{Reference}:
1922@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1923@end table
1924
1925
1926
506f068e
TB
1927@node omp_test_lock
1928@subsection @code{omp_test_lock} -- Test and set simple lock if available
d77de738
ML
1929@table @asis
1930@item @emph{Description}:
506f068e
TB
1931Before setting a simple lock, the lock variable must be initialized by
1932@code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1933does not block if the lock is not available. This function returns
1934@code{true} upon success, @code{false} otherwise. Here, @code{true} and
1935@code{false} represent their language-specific counterparts.
d77de738
ML
1936
1937@item @emph{C/C++}:
1938@multitable @columnfractions .20 .80
506f068e 1939@item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
d77de738
ML
1940@end multitable
1941
1942@item @emph{Fortran}:
1943@multitable @columnfractions .20 .80
506f068e
TB
1944@item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1945@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1946@end multitable
1947
1948@item @emph{See also}:
1949@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1950
1951@item @emph{Reference}:
1952@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1953@end table
1954
1955
1956
1957@node omp_test_nest_lock
1958@subsection @code{omp_test_nest_lock} -- Test and set nested lock if available
1959@table @asis
1960@item @emph{Description}:
1961Before setting a nested lock, the lock variable must be initialized by
1962@code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock},
1963@code{omp_test_nest_lock} does not block if the lock is not available.
1964If the lock is already held by the current thread, the new nesting count
1965is returned. Otherwise, the return value equals zero.
1966
1967@item @emph{C/C++}:
1968@multitable @columnfractions .20 .80
1969@item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1970@end multitable
1971
1972@item @emph{Fortran}:
1973@multitable @columnfractions .20 .80
1974@item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
d77de738
ML
1975@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1976@end multitable
1977
506f068e 1978
d77de738 1979@item @emph{See also}:
506f068e 1980@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
d77de738
ML
1981
1982@item @emph{Reference}:
506f068e 1983@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
d77de738
ML
1984@end table
1985
1986
1987
506f068e
TB
1988@node Timing Routines
1989@section Timing Routines
1990
1991Portable, thread-based, wall clock timer.
1992The routines have C linkage and do not throw exceptions.
1993
1994@menu
1995* omp_get_wtick:: Get timer precision.
1996* omp_get_wtime:: Elapsed wall clock time.
1997@end menu
1998
1999
2000
d77de738 2001@node omp_get_wtick
506f068e 2002@subsection @code{omp_get_wtick} -- Get timer precision
d77de738
ML
2003@table @asis
2004@item @emph{Description}:
2005Gets the timer precision, i.e., the number of seconds between two
2006successive clock ticks.
2007
2008@item @emph{C/C++}:
2009@multitable @columnfractions .20 .80
2010@item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
2011@end multitable
2012
2013@item @emph{Fortran}:
2014@multitable @columnfractions .20 .80
2015@item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
2016@end multitable
2017
2018@item @emph{See also}:
2019@ref{omp_get_wtime}
2020
2021@item @emph{Reference}:
2022@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
2023@end table
2024
2025
2026
2027@node omp_get_wtime
506f068e 2028@subsection @code{omp_get_wtime} -- Elapsed wall clock time
d77de738
ML
2029@table @asis
2030@item @emph{Description}:
2031Elapsed wall clock time in seconds. The time is measured per thread, no
2032guarantee can be made that two distinct threads measure the same time.
2033Time is measured from some "time in the past", which is an arbitrary time
2034guaranteed not to change during the execution of the program.
2035
2036@item @emph{C/C++}:
2037@multitable @columnfractions .20 .80
2038@item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
2039@end multitable
2040
2041@item @emph{Fortran}:
2042@multitable @columnfractions .20 .80
2043@item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
2044@end multitable
2045
2046@item @emph{See also}:
2047@ref{omp_get_wtick}
2048
2049@item @emph{Reference}:
2050@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
2051@end table
2052
2053
2054
506f068e
TB
2055@node Event Routine
2056@section Event Routine
2057
2058Support for event objects.
2059The routine has C linkage and do not throw exceptions.
2060
2061@menu
2062* omp_fulfill_event:: Fulfill and destroy an OpenMP event.
2063@end menu
2064
2065
2066
d77de738 2067@node omp_fulfill_event
506f068e 2068@subsection @code{omp_fulfill_event} -- Fulfill and destroy an OpenMP event
d77de738
ML
2069@table @asis
2070@item @emph{Description}:
2071Fulfill the event associated with the event handle argument. Currently, it
2072is only used to fulfill events generated by detach clauses on task
2073constructs - the effect of fulfilling the event is to allow the task to
2074complete.
2075
2076The result of calling @code{omp_fulfill_event} with an event handle other
2077than that generated by a detach clause is undefined. Calling it with an
2078event handle that has already been fulfilled is also undefined.
2079
2080@item @emph{C/C++}:
2081@multitable @columnfractions .20 .80
2082@item @emph{Prototype}: @tab @code{void omp_fulfill_event(omp_event_handle_t event);}
2083@end multitable
2084
2085@item @emph{Fortran}:
2086@multitable @columnfractions .20 .80
2087@item @emph{Interface}: @tab @code{subroutine omp_fulfill_event(event)}
2088@item @tab @code{integer (kind=omp_event_handle_kind) :: event}
2089@end multitable
2090
2091@item @emph{Reference}:
2092@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.5.1.
2093@end table
2094
2095
2096
506f068e
TB
2097@c @node Interoperability Routines
2098@c @section Interoperability Routines
2099@c
2100@c Routines to obtain properties from an @code{omp_interop_t} object.
2101@c They have C linkage and do not throw exceptions.
2102@c
2103@c @menu
2104@c * omp_get_num_interop_properties:: <fixme>
2105@c * omp_get_interop_int:: <fixme>
2106@c * omp_get_interop_ptr:: <fixme>
2107@c * omp_get_interop_str:: <fixme>
2108@c * omp_get_interop_name:: <fixme>
2109@c * omp_get_interop_type_desc:: <fixme>
2110@c * omp_get_interop_rc_desc:: <fixme>
2111@c @end menu
2112
2113@c @node Memory Management Routines
2114@c @section Memory Management Routines
2115@c
2116@c Routines to manage and allocate memory on the current device.
2117@c They have C linkage and do not throw exceptions.
2118@c
2119@c @menu
2120@c * omp_init_allocator:: <fixme>
2121@c * omp_destroy_allocator:: <fixme>
2122@c * omp_set_default_allocator:: <fixme>
2123@c * omp_get_default_allocator:: <fixme>
2124@c * omp_alloc:: <fixme>
2125@c * omp_aligned_alloc:: <fixme>
2126@c * omp_free:: <fixme>
2127@c * omp_calloc:: <fixme>
2128@c * omp_aligned_calloc:: <fixme>
2129@c * omp_realloc:: <fixme>
2130@c * omp_get_memspace_num_resources:: <fixme>/TR11
2131@c * omp_get_submemspace:: <fixme>/TR11
2132@c @end menu
2133
2134@c @node Tool Control Routine
2135@c
2136@c FIXME
2137
2138@c @node Environment Display Routine
2139@c @section Environment Display Routine
2140@c
2141@c Routine to display the OpenMP number and the initial value of ICVs.
2142@c It has C linkage and do not throw exceptions.
2143@c
2144@c menu
2145@c * omp_display_env:: <fixme>
2146@c end menu
2147
d77de738
ML
2148@c ---------------------------------------------------------------------
2149@c OpenMP Environment Variables
2150@c ---------------------------------------------------------------------
2151
2152@node Environment Variables
2153@chapter OpenMP Environment Variables
2154
2155The environment variables which beginning with @env{OMP_} are defined by
2cd0689a
TB
2156section 4 of the OpenMP specification in version 4.5 or in a later version
2157of the specification, while those beginning with @env{GOMP_} are GNU extensions.
2158Most @env{OMP_} environment variables have an associated internal control
2159variable (ICV).
2160
2161For any OpenMP environment variable that sets an ICV and is neither
2162@code{OMP_DEFAULT_DEVICE} nor has global ICV scope, associated
2163device-specific environment variables exist. For them, the environment
2164variable without suffix affects the host. The suffix @code{_DEV_} followed
2165by a non-negative device number less that the number of available devices sets
2166the ICV for the corresponding device. The suffix @code{_DEV} sets the ICV
2167of all non-host devices for which a device-specific corresponding environment
2168variable has not been set while the @code{_ALL} suffix sets the ICV of all
2169host and non-host devices for which a more specific corresponding environment
2170variable is not set.
d77de738
ML
2171
2172@menu
73a0d3bf
TB
2173* OMP_ALLOCATOR:: Set the default allocator
2174* OMP_AFFINITY_FORMAT:: Set the format string used for affinity display
d77de738 2175* OMP_CANCELLATION:: Set whether cancellation is activated
73a0d3bf 2176* OMP_DISPLAY_AFFINITY:: Display thread affinity information
d77de738
ML
2177* OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
2178* OMP_DEFAULT_DEVICE:: Set the device used in target regions
2179* OMP_DYNAMIC:: Dynamic adjustment of threads
2180* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
2181* OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
2182* OMP_NESTED:: Nested parallel regions
2183* OMP_NUM_TEAMS:: Specifies the number of teams to use by teams region
2184* OMP_NUM_THREADS:: Specifies the number of threads to use
0b9bd33d
JJ
2185* OMP_PROC_BIND:: Whether threads may be moved between CPUs
2186* OMP_PLACES:: Specifies on which CPUs the threads should be placed
d77de738
ML
2187* OMP_STACKSIZE:: Set default thread stack size
2188* OMP_SCHEDULE:: How threads are scheduled
2189* OMP_TARGET_OFFLOAD:: Controls offloading behaviour
2190* OMP_TEAMS_THREAD_LIMIT:: Set the maximum number of threads imposed by teams
2191* OMP_THREAD_LIMIT:: Set the maximum number of threads
2192* OMP_WAIT_POLICY:: How waiting threads are handled
2193* GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
2194* GOMP_DEBUG:: Enable debugging output
2195* GOMP_STACKSIZE:: Set default thread stack size
2196* GOMP_SPINCOUNT:: Set the busy-wait spin count
2197* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
2198@end menu
2199
2200
73a0d3bf
TB
2201@node OMP_ALLOCATOR
2202@section @env{OMP_ALLOCATOR} -- Set the default allocator
2203@cindex Environment Variable
2204@table @asis
2cd0689a
TB
2205@item @emph{ICV:} @var{available-devices-var}
2206@item @emph{Scope:} data environment
73a0d3bf
TB
2207@item @emph{Description}:
2208Sets the default allocator that is used when no allocator has been specified
2209in the @code{allocate} or @code{allocator} clause or if an OpenMP memory
2210routine is invoked with the @code{omp_null_allocator} allocator.
2211If unset, @code{omp_default_mem_alloc} is used.
2212
2213The value can either be a predefined allocator or a predefined memory space
2214or a predefined memory space followed by a colon and a comma-separated list
2215of memory trait and value pairs, separated by @code{=}.
2216
2cd0689a
TB
2217Note: The corresponding device environment variables are currently not
2218supported. Therefore, the non-host @var{def-allocator-var} ICVs are always
2219initialized to @code{omp_default_mem_alloc}. However, on all devices,
2220the @code{omp_set_default_allocator} API routine can be used to change
2221value.
2222
73a0d3bf 2223@multitable @columnfractions .45 .45
a85a106c 2224@headitem Predefined allocators @tab Associated predefined memory spaces
73a0d3bf
TB
2225@item omp_default_mem_alloc @tab omp_default_mem_space
2226@item omp_large_cap_mem_alloc @tab omp_large_cap_mem_space
2227@item omp_const_mem_alloc @tab omp_const_mem_space
2228@item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
2229@item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
2230@item omp_cgroup_mem_alloc @tab --
2231@item omp_pteam_mem_alloc @tab --
2232@item omp_thread_mem_alloc @tab --
2233@end multitable
2234
a85a106c
TB
2235The predefined allocators use the default values for the traits,
2236as listed below. Except that the last three allocators have the
2237@code{access} trait set to @code{cgroup}, @code{pteam}, and
2238@code{thread}, respectively.
2239
2240@multitable @columnfractions .25 .40 .25
2241@headitem Trait @tab Allowed values @tab Default value
73a0d3bf
TB
2242@item @code{sync_hint} @tab @code{contended}, @code{uncontended},
2243 @code{serialized}, @code{private}
a85a106c 2244 @tab @code{contended}
73a0d3bf 2245@item @code{alignment} @tab Positive integer being a power of two
a85a106c 2246 @tab 1 byte
73a0d3bf
TB
2247@item @code{access} @tab @code{all}, @code{cgroup},
2248 @code{pteam}, @code{thread}
a85a106c 2249 @tab @code{all}
73a0d3bf 2250@item @code{pool_size} @tab Positive integer
a85a106c 2251 @tab See @ref{Memory allocation}
73a0d3bf
TB
2252@item @code{fallback} @tab @code{default_mem_fb}, @code{null_fb},
2253 @code{abort_fb}, @code{allocator_fb}
a85a106c 2254 @tab See below
73a0d3bf 2255@item @code{fb_data} @tab @emph{unsupported as it needs an allocator handle}
a85a106c 2256 @tab (none)
73a0d3bf 2257@item @code{pinned} @tab @code{true}, @code{false}
a85a106c 2258 @tab @code{false}
73a0d3bf
TB
2259@item @code{partition} @tab @code{environment}, @code{nearest},
2260 @code{blocked}, @code{interleaved}
a85a106c 2261 @tab @code{environment}
73a0d3bf
TB
2262@end multitable
2263
a85a106c
TB
2264For the @code{fallback} trait, the default value is @code{null_fb} for the
2265@code{omp_default_mem_alloc} allocator and any allocator that is associated
2266with device memory; for all other other allocators, it is @code{default_mem_fb}
2267by default.
2268
73a0d3bf
TB
2269Examples:
2270@smallexample
2271OMP_ALLOCATOR=omp_high_bw_mem_alloc
2272OMP_ALLOCATOR=omp_large_cap_mem_space
506f068e 2273OMP_ALLOCATOR=omp_low_lat_mem_space:pinned=true,partition=nearest
73a0d3bf
TB
2274@end smallexample
2275
a85a106c
TB
2276@item @emph{See also}:
2277@ref{Memory allocation}
73a0d3bf
TB
2278
2279@item @emph{Reference}:
2280@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.21
2281@end table
2282
2283
2284
2285@node OMP_AFFINITY_FORMAT
2286@section @env{OMP_AFFINITY_FORMAT} -- Set the format string used for affinity display
2287@cindex Environment Variable
2288@table @asis
2cd0689a
TB
2289@item @emph{ICV:} @var{affinity-format-var}
2290@item @emph{Scope:} device
73a0d3bf
TB
2291@item @emph{Description}:
2292Sets the format string used when displaying OpenMP thread affinity information.
2293Special values are output using @code{%} followed by an optional size
2294specification and then either the single-character field type or its long
2295name enclosed in curly braces; using @code{%%} will display a literal percent.
2296The size specification consists of an optional @code{0.} or @code{.} followed
450b05ce 2297by a positive integer, specifying the minimal width of the output. With
73a0d3bf
TB
2298@code{0.} and numerical values, the output is padded with zeros on the left;
2299with @code{.}, the output is padded by spaces on the left; otherwise, the
2300output is padded by spaces on the right. If unset, the value is
2301``@code{level %L thread %i affinity %A}''.
2302
2303Supported field types are:
2304
2305@multitable @columnfractions .10 .25 .60
2306@item t @tab team_num @tab value returned by @code{omp_get_team_num}
2307@item T @tab num_teams @tab value returned by @code{omp_get_num_teams}
2308@item L @tab nesting_level @tab value returned by @code{omp_get_level}
2309@item n @tab thread_num @tab value returned by @code{omp_get_thread_num}
2310@item N @tab num_threads @tab value returned by @code{omp_get_num_threads}
2311@item a @tab ancestor_tnum
2312 @tab value returned by
2313 @code{omp_get_ancestor_thread_num(omp_get_level()-1)}
2314@item H @tab host @tab name of the host that executes the thread
450b05ce
TB
2315@item P @tab process_id @tab process identifier
2316@item i @tab native_thread_id @tab native thread identifier
73a0d3bf
TB
2317@item A @tab thread_affinity
2318 @tab comma separated list of integer values or ranges, representing the
2319 processors on which a process might execute, subject to affinity
2320 mechanisms
2321@end multitable
2322
2323For instance, after setting
2324
2325@smallexample
2326OMP_AFFINITY_FORMAT="%0.2a!%n!%.4L!%N;%.2t;%0.2T;%@{team_num@};%@{num_teams@};%A"
2327@end smallexample
2328
2329with either @code{OMP_DISPLAY_AFFINITY} being set or when calling
2330@code{omp_display_affinity} with @code{NULL} or an empty string, the program
2331might display the following:
2332
2333@smallexample
233400!0! 1!4; 0;01;0;1;0-11
233500!3! 1!4; 0;01;0;1;0-11
233600!2! 1!4; 0;01;0;1;0-11
233700!1! 1!4; 0;01;0;1;0-11
2338@end smallexample
2339
2340@item @emph{See also}:
2341@ref{OMP_DISPLAY_AFFINITY}
2342
2343@item @emph{Reference}:
2344@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.14
2345@end table
2346
2347
2348
d77de738
ML
2349@node OMP_CANCELLATION
2350@section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
2351@cindex Environment Variable
2352@table @asis
2cd0689a
TB
2353@item @emph{ICV:} @var{cancel-var}
2354@item @emph{Scope:} global
d77de738
ML
2355@item @emph{Description}:
2356If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or
2357if unset, cancellation is disabled and the @code{cancel} construct is ignored.
2358
2359@item @emph{See also}:
2360@ref{omp_get_cancellation}
2361
2362@item @emph{Reference}:
2363@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
2364@end table
2365
2366
2367
73a0d3bf
TB
2368@node OMP_DISPLAY_AFFINITY
2369@section @env{OMP_DISPLAY_AFFINITY} -- Display thread affinity information
2370@cindex Environment Variable
2371@table @asis
2cd0689a
TB
2372@item @emph{ICV:} @var{display-affinity-var}
2373@item @emph{Scope:} global
73a0d3bf
TB
2374@item @emph{Description}:
2375If set to @code{FALSE} or if unset, affinity displaying is disabled.
2376If set to @code{TRUE}, the runtime will display affinity information about
2377OpenMP threads in a parallel region upon entering the region and every time
2378any change occurs.
2379
2380@item @emph{See also}:
2381@ref{OMP_AFFINITY_FORMAT}
2382
2383@item @emph{Reference}:
2384@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.13
2385@end table
2386
2387
2388
2389
d77de738
ML
2390@node OMP_DISPLAY_ENV
2391@section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
2392@cindex Environment Variable
2393@table @asis
2cd0689a
TB
2394@item @emph{ICV:} none
2395@item @emph{Scope:} not applicable
d77de738
ML
2396@item @emph{Description}:
2397If set to @code{TRUE}, the OpenMP version number and the values
2398associated with the OpenMP environment variables are printed to @code{stderr}.
2399If set to @code{VERBOSE}, it additionally shows the value of the environment
2400variables which are GNU extensions. If undefined or set to @code{FALSE},
2401this information will not be shown.
2402
2403
2404@item @emph{Reference}:
2405@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
2406@end table
2407
2408
2409
2410@node OMP_DEFAULT_DEVICE
2411@section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
2412@cindex Environment Variable
2413@table @asis
2cd0689a
TB
2414@item @emph{ICV:} @var{default-device-var}
2415@item @emph{Scope:} data environment
d77de738
ML
2416@item @emph{Description}:
2417Set to choose the device which is used in a @code{target} region, unless the
2418value is overridden by @code{omp_set_default_device} or by a @code{device}
2419clause. The value shall be the nonnegative device number. If no device with
2420the given device number exists, the code is executed on the host. If unset,
18c8b56c
TB
2421@env{OMP_TARGET_OFFLOAD} is @code{mandatory} and no non-host devices are
2422available, it is set to @code{omp_invalid_device}. Otherwise, if unset,
d77de738
ML
2423device number 0 will be used.
2424
2425
2426@item @emph{See also}:
2427@ref{omp_get_default_device}, @ref{omp_set_default_device},
2428
2429@item @emph{Reference}:
2430@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
2431@end table
2432
2433
2434
2435@node OMP_DYNAMIC
2436@section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
2437@cindex Environment Variable
2438@table @asis
2cd0689a
TB
2439@item @emph{ICV:} @var{dyn-var}
2440@item @emph{Scope:} global
d77de738
ML
2441@item @emph{Description}:
2442Enable or disable the dynamic adjustment of the number of threads
2443within a team. The value of this environment variable shall be
2444@code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is
2445disabled by default.
2446
2447@item @emph{See also}:
2448@ref{omp_set_dynamic}
2449
2450@item @emph{Reference}:
2451@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
2452@end table
2453
2454
2455
2456@node OMP_MAX_ACTIVE_LEVELS
2457@section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
2458@cindex Environment Variable
2459@table @asis
2cd0689a
TB
2460@item @emph{ICV:} @var{max-active-levels-var}
2461@item @emph{Scope:} data environment
d77de738
ML
2462@item @emph{Description}:
2463Specifies the initial value for the maximum number of nested parallel
2464regions. The value of this variable shall be a positive integer.
2465If undefined, then if @env{OMP_NESTED} is defined and set to true, or
2466if @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND} are defined and set to
2467a list with more than one item, the maximum number of nested parallel
2468regions will be initialized to the largest number supported, otherwise
2469it will be set to one.
2470
2471@item @emph{See also}:
2cd0689a
TB
2472@ref{omp_set_max_active_levels}, @ref{OMP_NESTED}, @ref{OMP_PROC_BIND},
2473@ref{OMP_NUM_THREADS}
2474
d77de738
ML
2475
2476@item @emph{Reference}:
2477@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
2478@end table
2479
2480
2481
2482@node OMP_MAX_TASK_PRIORITY
2483@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
2484number that can be set for a task.
2485@cindex Environment Variable
2486@table @asis
2cd0689a
TB
2487@item @emph{ICV:} @var{max-task-priority-var}
2488@item @emph{Scope:} global
d77de738
ML
2489@item @emph{Description}:
2490Specifies the initial value for the maximum priority value that can be
2491set for a task. The value of this variable shall be a non-negative
2492integer, and zero is allowed. If undefined, the default priority is
24930.
2494
2495@item @emph{See also}:
2496@ref{omp_get_max_task_priority}
2497
2498@item @emph{Reference}:
2499@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
2500@end table
2501
2502
2503
2504@node OMP_NESTED
2505@section @env{OMP_NESTED} -- Nested parallel regions
2506@cindex Environment Variable
2507@cindex Implementation specific setting
2508@table @asis
2cd0689a
TB
2509@item @emph{ICV:} @var{max-active-levels-var}
2510@item @emph{Scope:} data environment
d77de738
ML
2511@item @emph{Description}:
2512Enable or disable nested parallel regions, i.e., whether team members
2513are allowed to create new teams. The value of this environment variable
2514shall be @code{TRUE} or @code{FALSE}. If set to @code{TRUE}, the number
2515of maximum active nested regions supported will by default be set to the
2516maximum supported, otherwise it will be set to one. If
2517@env{OMP_MAX_ACTIVE_LEVELS} is defined, its setting will override this
2518setting. If both are undefined, nested parallel regions are enabled if
2519@env{OMP_NUM_THREADS} or @env{OMP_PROC_BINDS} are defined to a list with
2520more than one item, otherwise they are disabled by default.
2521
2cd0689a
TB
2522Note that the @code{OMP_NESTED} environment variable was deprecated in
2523the OpenMP specification 5.2 in favor of @code{OMP_MAX_ACTIVE_LEVELS}.
2524
d77de738 2525@item @emph{See also}:
2cd0689a
TB
2526@ref{omp_set_max_active_levels}, @ref{omp_set_nested},
2527@ref{OMP_MAX_ACTIVE_LEVELS}
d77de738
ML
2528
2529@item @emph{Reference}:
2530@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
2531@end table
2532
2533
2534
2535@node OMP_NUM_TEAMS
2536@section @env{OMP_NUM_TEAMS} -- Specifies the number of teams to use by teams region
2537@cindex Environment Variable
2538@table @asis
2cd0689a
TB
2539@item @emph{ICV:} @var{nteams-var}
2540@item @emph{Scope:} device
d77de738
ML
2541@item @emph{Description}:
2542Specifies the upper bound for number of teams to use in teams regions
2543without explicit @code{num_teams} clause. The value of this variable shall
2544be a positive integer. If undefined it defaults to 0 which means
2545implementation defined upper bound.
2546
2547@item @emph{See also}:
2548@ref{omp_set_num_teams}
2549
2550@item @emph{Reference}:
2551@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.23
2552@end table
2553
2554
2555
2556@node OMP_NUM_THREADS
2557@section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
2558@cindex Environment Variable
2559@cindex Implementation specific setting
2560@table @asis
2cd0689a
TB
2561@item @emph{ICV:} @var{nthreads-var}
2562@item @emph{Scope:} data environment
d77de738
ML
2563@item @emph{Description}:
2564Specifies the default number of threads to use in parallel regions. The
2565value of this variable shall be a comma-separated list of positive integers;
2566the value specifies the number of threads to use for the corresponding nested
2567level. Specifying more than one item in the list will automatically enable
2568nesting by default. If undefined one thread per CPU is used.
2569
2cd0689a
TB
2570When a list with more than value is specified, it also affects the
2571@var{max-active-levels-var} ICV as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
2572
d77de738 2573@item @emph{See also}:
2cd0689a 2574@ref{omp_set_num_threads}, @ref{OMP_MAX_ACTIVE_LEVELS}
d77de738
ML
2575
2576@item @emph{Reference}:
2577@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
2578@end table
2579
2580
2581
2582@node OMP_PROC_BIND
0b9bd33d 2583@section @env{OMP_PROC_BIND} -- Whether threads may be moved between CPUs
d77de738
ML
2584@cindex Environment Variable
2585@table @asis
2cd0689a
TB
2586@item @emph{ICV:} @var{bind-var}
2587@item @emph{Scope:} data environment
d77de738
ML
2588@item @emph{Description}:
2589Specifies whether threads may be moved between processors. If set to
0b9bd33d 2590@code{TRUE}, OpenMP threads should not be moved; if set to @code{FALSE}
d77de738
ML
2591they may be moved. Alternatively, a comma separated list with the
2592values @code{PRIMARY}, @code{MASTER}, @code{CLOSE} and @code{SPREAD} can
2593be used to specify the thread affinity policy for the corresponding nesting
2594level. With @code{PRIMARY} and @code{MASTER} the worker threads are in the
2595same place partition as the primary thread. With @code{CLOSE} those are
2596kept close to the primary thread in contiguous place partitions. And
2597with @code{SPREAD} a sparse distribution
2598across the place partitions is used. Specifying more than one item in the
2599list will automatically enable nesting by default.
2600
2cd0689a
TB
2601When a list is specified, it also affects the @var{max-active-levels-var} ICV
2602as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
2603
d77de738
ML
2604When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
2605@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
2606
2607@item @emph{See also}:
2cd0689a
TB
2608@ref{omp_get_proc_bind}, @ref{GOMP_CPU_AFFINITY}, @ref{OMP_PLACES},
2609@ref{OMP_MAX_ACTIVE_LEVELS}
d77de738
ML
2610
2611@item @emph{Reference}:
2612@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
2613@end table
2614
2615
2616
2617@node OMP_PLACES
0b9bd33d 2618@section @env{OMP_PLACES} -- Specifies on which CPUs the threads should be placed
d77de738
ML
2619@cindex Environment Variable
2620@table @asis
2cd0689a
TB
2621@item @emph{ICV:} @var{place-partition-var}
2622@item @emph{Scope:} implicit tasks
d77de738
ML
2623@item @emph{Description}:
2624The thread placement can be either specified using an abstract name or by an
2625explicit list of the places. The abstract names @code{threads}, @code{cores},
2626@code{sockets}, @code{ll_caches} and @code{numa_domains} can be optionally
2627followed by a positive number in parentheses, which denotes the how many places
2628shall be created. With @code{threads} each place corresponds to a single
2629hardware thread; @code{cores} to a single core with the corresponding number of
2630hardware threads; with @code{sockets} the place corresponds to a single
2631socket; with @code{ll_caches} to a set of cores that shares the last level
2632cache on the device; and @code{numa_domains} to a set of cores for which their
2633closest memory on the device is the same memory and at a similar distance from
2634the cores. The resulting placement can be shown by setting the
2635@env{OMP_DISPLAY_ENV} environment variable.
2636
2637Alternatively, the placement can be specified explicitly as comma-separated
2638list of places. A place is specified by set of nonnegative numbers in curly
2639braces, denoting the hardware threads. The curly braces can be omitted
2640when only a single number has been specified. The hardware threads
2641belonging to a place can either be specified as comma-separated list of
2642nonnegative thread numbers or using an interval. Multiple places can also be
2643either specified by a comma-separated list of places or by an interval. To
2644specify an interval, a colon followed by the count is placed after
2645the hardware thread number or the place. Optionally, the length can be
2646followed by a colon and the stride number -- otherwise a unit stride is
2647assumed. Placing an exclamation mark (@code{!}) directly before a curly
2648brace or numbers inside the curly braces (excluding intervals) will
2649exclude those hardware threads.
2650
2651For instance, the following specifies the same places list:
2652@code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
2653@code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
2654
2655If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
2656@env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
2657between CPUs following no placement policy.
2658
2659@item @emph{See also}:
2660@ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
2661@ref{OMP_DISPLAY_ENV}
2662
2663@item @emph{Reference}:
2664@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
2665@end table
2666
2667
2668
2669@node OMP_STACKSIZE
2670@section @env{OMP_STACKSIZE} -- Set default thread stack size
2671@cindex Environment Variable
2672@table @asis
2cd0689a
TB
2673@item @emph{ICV:} @var{stacksize-var}
2674@item @emph{Scope:} device
d77de738
ML
2675@item @emph{Description}:
2676Set the default thread stack size in kilobytes, unless the number
2677is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
2678case the size is, respectively, in bytes, kilobytes, megabytes
2679or gigabytes. This is different from @code{pthread_attr_setstacksize}
2680which gets the number of bytes as an argument. If the stack size cannot
2681be set due to system constraints, an error is reported and the initial
2682stack size is left unchanged. If undefined, the stack size is system
2683dependent.
2684
2cd0689a
TB
2685@item @emph{See also}:
2686@ref{GOMP_STACKSIZE}
2687
d77de738
ML
2688@item @emph{Reference}:
2689@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
2690@end table
2691
2692
2693
2694@node OMP_SCHEDULE
2695@section @env{OMP_SCHEDULE} -- How threads are scheduled
2696@cindex Environment Variable
2697@cindex Implementation specific setting
2698@table @asis
2cd0689a
TB
2699@item @emph{ICV:} @var{run-sched-var}
2700@item @emph{Scope:} data environment
d77de738
ML
2701@item @emph{Description}:
2702Allows to specify @code{schedule type} and @code{chunk size}.
2703The value of the variable shall have the form: @code{type[,chunk]} where
2704@code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
2705The optional @code{chunk} size shall be a positive integer. If undefined,
2706dynamic scheduling and a chunk size of 1 is used.
2707
2708@item @emph{See also}:
2709@ref{omp_set_schedule}
2710
2711@item @emph{Reference}:
2712@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
2713@end table
2714
2715
2716
2717@node OMP_TARGET_OFFLOAD
2718@section @env{OMP_TARGET_OFFLOAD} -- Controls offloading behaviour
2719@cindex Environment Variable
2720@cindex Implementation specific setting
2721@table @asis
2cd0689a
TB
2722@item @emph{ICV:} @var{target-offload-var}
2723@item @emph{Scope:} global
d77de738
ML
2724@item @emph{Description}:
2725Specifies the behaviour with regard to offloading code to a device. This
2726variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED}
2727or @code{DEFAULT}.
2728
2729If set to @code{MANDATORY}, the program will terminate with an error if
2730the offload device is not present or is not supported. If set to
2731@code{DISABLED}, then offloading is disabled and all code will run on the
2732host. If set to @code{DEFAULT}, the program will try offloading to the
2733device first, then fall back to running code on the host if it cannot.
2734
2735If undefined, then the program will behave as if @code{DEFAULT} was set.
2736
2737@item @emph{Reference}:
2738@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.17
2739@end table
2740
2741
2742
2743@node OMP_TEAMS_THREAD_LIMIT
2744@section @env{OMP_TEAMS_THREAD_LIMIT} -- Set the maximum number of threads imposed by teams
2745@cindex Environment Variable
2746@table @asis
2cd0689a
TB
2747@item @emph{ICV:} @var{teams-thread-limit-var}
2748@item @emph{Scope:} device
d77de738
ML
2749@item @emph{Description}:
2750Specifies an upper bound for the number of threads to use by each contention
2751group created by a teams construct without explicit @code{thread_limit}
2752clause. The value of this variable shall be a positive integer. If undefined,
2753the value of 0 is used which stands for an implementation defined upper
2754limit.
2755
2756@item @emph{See also}:
2757@ref{OMP_THREAD_LIMIT}, @ref{omp_set_teams_thread_limit}
2758
2759@item @emph{Reference}:
2760@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.24
2761@end table
2762
2763
2764
2765@node OMP_THREAD_LIMIT
2766@section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
2767@cindex Environment Variable
2768@table @asis
2cd0689a
TB
2769@item @emph{ICV:} @var{thread-limit-var}
2770@item @emph{Scope:} data environment
d77de738
ML
2771@item @emph{Description}:
2772Specifies the number of threads to use for the whole program. The
2773value of this variable shall be a positive integer. If undefined,
2774the number of threads is not limited.
2775
2776@item @emph{See also}:
2777@ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
2778
2779@item @emph{Reference}:
2780@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
2781@end table
2782
2783
2784
2785@node OMP_WAIT_POLICY
2786@section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
2787@cindex Environment Variable
2788@table @asis
2789@item @emph{Description}:
2790Specifies whether waiting threads should be active or passive. If
2791the value is @code{PASSIVE}, waiting threads should not consume CPU
2792power while waiting; while the value is @code{ACTIVE} specifies that
2793they should. If undefined, threads wait actively for a short time
2794before waiting passively.
2795
2796@item @emph{See also}:
2797@ref{GOMP_SPINCOUNT}
2798
2799@item @emph{Reference}:
2800@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
2801@end table
2802
2803
2804
2805@node GOMP_CPU_AFFINITY
2806@section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
2807@cindex Environment Variable
2808@table @asis
2809@item @emph{Description}:
2810Binds threads to specific CPUs. The variable should contain a space-separated
2811or comma-separated list of CPUs. This list may contain different kinds of
2812entries: either single CPU numbers in any order, a range of CPUs (M-N)
2813or a range with some stride (M-N:S). CPU numbers are zero based. For example,
2814@code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
2815to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
2816CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
2817and 14 respectively and then start assigning back from the beginning of
2818the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
2819
2820There is no libgomp library routine to determine whether a CPU affinity
2821specification is in effect. As a workaround, language-specific library
2822functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
2823Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
2824environment variable. A defined CPU affinity on startup cannot be changed
2825or disabled during the runtime of the application.
2826
2827If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
2828@env{OMP_PROC_BIND} has a higher precedence. If neither has been set and
2829@env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
2830@code{FALSE}, the host system will handle the assignment of threads to CPUs.
2831
2832@item @emph{See also}:
2833@ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
2834@end table
2835
2836
2837
2838@node GOMP_DEBUG
2839@section @env{GOMP_DEBUG} -- Enable debugging output
2840@cindex Environment Variable
2841@table @asis
2842@item @emph{Description}:
2843Enable debugging output. The variable should be set to @code{0}
2844(disabled, also the default if not set), or @code{1} (enabled).
2845
2846If enabled, some debugging output will be printed during execution.
2847This is currently not specified in more detail, and subject to change.
2848@end table
2849
2850
2851
2852@node GOMP_STACKSIZE
2853@section @env{GOMP_STACKSIZE} -- Set default thread stack size
2854@cindex Environment Variable
2855@cindex Implementation specific setting
2856@table @asis
2857@item @emph{Description}:
2858Set the default thread stack size in kilobytes. This is different from
2859@code{pthread_attr_setstacksize} which gets the number of bytes as an
2860argument. If the stack size cannot be set due to system constraints, an
2861error is reported and the initial stack size is left unchanged. If undefined,
2862the stack size is system dependent.
2863
2864@item @emph{See also}:
2865@ref{OMP_STACKSIZE}
2866
2867@item @emph{Reference}:
2868@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
2869GCC Patches Mailinglist},
2870@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
2871GCC Patches Mailinglist}
2872@end table
2873
2874
2875
2876@node GOMP_SPINCOUNT
2877@section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
2878@cindex Environment Variable
2879@cindex Implementation specific setting
2880@table @asis
2881@item @emph{Description}:
2882Determines how long a threads waits actively with consuming CPU power
2883before waiting passively without consuming CPU power. The value may be
2884either @code{INFINITE}, @code{INFINITY} to always wait actively or an
2885integer which gives the number of spins of the busy-wait loop. The
2886integer may optionally be followed by the following suffixes acting
2887as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
2888million), @code{G} (giga, billion), or @code{T} (tera, trillion).
2889If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
2890300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
289130 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
2892If there are more OpenMP threads than available CPUs, 1000 and 100
2893spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
2894undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
2895or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
2896
2897@item @emph{See also}:
2898@ref{OMP_WAIT_POLICY}
2899@end table
2900
2901
2902
2903@node GOMP_RTEMS_THREAD_POOLS
2904@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
2905@cindex Environment Variable
2906@cindex Implementation specific setting
2907@table @asis
2908@item @emph{Description}:
2909This environment variable is only used on the RTEMS real-time operating system.
2910It determines the scheduler instance specific thread pools. The format for
2911@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
2912@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
2913separated by @code{:} where:
2914@itemize @bullet
2915@item @code{<thread-pool-count>} is the thread pool count for this scheduler
2916instance.
2917@item @code{$<priority>} is an optional priority for the worker threads of a
2918thread pool according to @code{pthread_setschedparam}. In case a priority
2919value is omitted, then a worker thread will inherit the priority of the OpenMP
2920primary thread that created it. The priority of the worker thread is not
2921changed after creation, even if a new OpenMP primary thread using the worker has
2922a different priority.
2923@item @code{@@<scheduler-name>} is the scheduler instance name according to the
2924RTEMS application configuration.
2925@end itemize
2926In case no thread pool configuration is specified for a scheduler instance,
2927then each OpenMP primary thread of this scheduler instance will use its own
2928dynamically allocated thread pool. To limit the worker thread count of the
2929thread pools, each OpenMP primary thread must call @code{omp_set_num_threads}.
2930@item @emph{Example}:
2931Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
2932@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
2933@code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
2934scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
2935one thread pool available. Since no priority is specified for this scheduler
2936instance, the worker thread inherits the priority of the OpenMP primary thread
2937that created it. In the scheduler instance @code{WRK1} there are three thread
2938pools available and their worker threads run at priority four.
2939@end table
2940
2941
2942
2943@c ---------------------------------------------------------------------
2944@c Enabling OpenACC
2945@c ---------------------------------------------------------------------
2946
2947@node Enabling OpenACC
2948@chapter Enabling OpenACC
2949
2950To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
2951flag @option{-fopenacc} must be specified. This enables the OpenACC directive
2952@code{#pragma acc} in C/C++ and @code{!$acc} directives in free form,
2953@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
2954@code{!$} conditional compilation sentinels in free form and @code{c$},
2955@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
2956arranges for automatic linking of the OpenACC runtime library
2957(@ref{OpenACC Runtime Library Routines}).
2958
2959See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
2960
2961A complete description of all OpenACC directives accepted may be found in
2962the @uref{https://www.openacc.org, OpenACC} Application Programming
2963Interface manual, version 2.6.
2964
2965
2966
2967@c ---------------------------------------------------------------------
2968@c OpenACC Runtime Library Routines
2969@c ---------------------------------------------------------------------
2970
2971@node OpenACC Runtime Library Routines
2972@chapter OpenACC Runtime Library Routines
2973
2974The runtime routines described here are defined by section 3 of the OpenACC
2975specifications in version 2.6.
2976They have C linkage, and do not throw exceptions.
2977Generally, they are available only for the host, with the exception of
2978@code{acc_on_device}, which is available for both the host and the
2979acceleration device.
2980
2981@menu
2982* acc_get_num_devices:: Get number of devices for the given device
2983 type.
2984* acc_set_device_type:: Set type of device accelerator to use.
2985* acc_get_device_type:: Get type of device accelerator to be used.
2986* acc_set_device_num:: Set device number to use.
2987* acc_get_device_num:: Get device number to be used.
2988* acc_get_property:: Get device property.
2989* acc_async_test:: Tests for completion of a specific asynchronous
2990 operation.
2991* acc_async_test_all:: Tests for completion of all asynchronous
2992 operations.
2993* acc_wait:: Wait for completion of a specific asynchronous
2994 operation.
2995* acc_wait_all:: Waits for completion of all asynchronous
2996 operations.
2997* acc_wait_all_async:: Wait for completion of all asynchronous
2998 operations.
2999* acc_wait_async:: Wait for completion of asynchronous operations.
3000* acc_init:: Initialize runtime for a specific device type.
3001* acc_shutdown:: Shuts down the runtime for a specific device
3002 type.
3003* acc_on_device:: Whether executing on a particular device
3004* acc_malloc:: Allocate device memory.
3005* acc_free:: Free device memory.
3006* acc_copyin:: Allocate device memory and copy host memory to
3007 it.
3008* acc_present_or_copyin:: If the data is not present on the device,
3009 allocate device memory and copy from host
3010 memory.
3011* acc_create:: Allocate device memory and map it to host
3012 memory.
3013* acc_present_or_create:: If the data is not present on the device,
3014 allocate device memory and map it to host
3015 memory.
3016* acc_copyout:: Copy device memory to host memory.
3017* acc_delete:: Free device memory.
3018* acc_update_device:: Update device memory from mapped host memory.
3019* acc_update_self:: Update host memory from mapped device memory.
3020* acc_map_data:: Map previously allocated device memory to host
3021 memory.
3022* acc_unmap_data:: Unmap device memory from host memory.
3023* acc_deviceptr:: Get device pointer associated with specific
3024 host address.
3025* acc_hostptr:: Get host pointer associated with specific
3026 device address.
3027* acc_is_present:: Indicate whether host variable / array is
3028 present on device.
3029* acc_memcpy_to_device:: Copy host memory to device memory.
3030* acc_memcpy_from_device:: Copy device memory to host memory.
3031* acc_attach:: Let device pointer point to device-pointer target.
3032* acc_detach:: Let device pointer point to host-pointer target.
3033
3034API routines for target platforms.
3035
3036* acc_get_current_cuda_device:: Get CUDA device handle.
3037* acc_get_current_cuda_context::Get CUDA context handle.
3038* acc_get_cuda_stream:: Get CUDA stream handle.
3039* acc_set_cuda_stream:: Set CUDA stream handle.
3040
3041API routines for the OpenACC Profiling Interface.
3042
3043* acc_prof_register:: Register callbacks.
3044* acc_prof_unregister:: Unregister callbacks.
3045* acc_prof_lookup:: Obtain inquiry functions.
3046* acc_register_library:: Library registration.
3047@end menu
3048
3049
3050
3051@node acc_get_num_devices
3052@section @code{acc_get_num_devices} -- Get number of devices for given device type
3053@table @asis
3054@item @emph{Description}
3055This function returns a value indicating the number of devices available
3056for the device type specified in @var{devicetype}.
3057
3058@item @emph{C/C++}:
3059@multitable @columnfractions .20 .80
3060@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
3061@end multitable
3062
3063@item @emph{Fortran}:
3064@multitable @columnfractions .20 .80
3065@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
3066@item @tab @code{integer(kind=acc_device_kind) devicetype}
3067@end multitable
3068
3069@item @emph{Reference}:
3070@uref{https://www.openacc.org, OpenACC specification v2.6}, section
30713.2.1.
3072@end table
3073
3074
3075
3076@node acc_set_device_type
3077@section @code{acc_set_device_type} -- Set type of device accelerator to use.
3078@table @asis
3079@item @emph{Description}
3080This function indicates to the runtime library which device type, specified
3081in @var{devicetype}, to use when executing a parallel or kernels region.
3082
3083@item @emph{C/C++}:
3084@multitable @columnfractions .20 .80
3085@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
3086@end multitable
3087
3088@item @emph{Fortran}:
3089@multitable @columnfractions .20 .80
3090@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
3091@item @tab @code{integer(kind=acc_device_kind) devicetype}
3092@end multitable
3093
3094@item @emph{Reference}:
3095@uref{https://www.openacc.org, OpenACC specification v2.6}, section
30963.2.2.
3097@end table
3098
3099
3100
3101@node acc_get_device_type
3102@section @code{acc_get_device_type} -- Get type of device accelerator to be used.
3103@table @asis
3104@item @emph{Description}
3105This function returns what device type will be used when executing a
3106parallel or kernels region.
3107
3108This function returns @code{acc_device_none} if
3109@code{acc_get_device_type} is called from
3110@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
3111callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling
3112Interface}), that is, if the device is currently being initialized.
3113
3114@item @emph{C/C++}:
3115@multitable @columnfractions .20 .80
3116@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
3117@end multitable
3118
3119@item @emph{Fortran}:
3120@multitable @columnfractions .20 .80
3121@item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
3122@item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
3123@end multitable
3124
3125@item @emph{Reference}:
3126@uref{https://www.openacc.org, OpenACC specification v2.6}, section
31273.2.3.
3128@end table
3129
3130
3131
3132@node acc_set_device_num
3133@section @code{acc_set_device_num} -- Set device number to use.
3134@table @asis
3135@item @emph{Description}
3136This function will indicate to the runtime which device number,
3137specified by @var{devicenum}, associated with the specified device
3138type @var{devicetype}.
3139
3140@item @emph{C/C++}:
3141@multitable @columnfractions .20 .80
3142@item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);}
3143@end multitable
3144
3145@item @emph{Fortran}:
3146@multitable @columnfractions .20 .80
3147@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
3148@item @tab @code{integer devicenum}
3149@item @tab @code{integer(kind=acc_device_kind) devicetype}
3150@end multitable
3151
3152@item @emph{Reference}:
3153@uref{https://www.openacc.org, OpenACC specification v2.6}, section
31543.2.4.
3155@end table
3156
3157
3158
3159@node acc_get_device_num
3160@section @code{acc_get_device_num} -- Get device number to be used.
3161@table @asis
3162@item @emph{Description}
3163This function returns which device number associated with the specified device
3164type @var{devicetype}, will be used when executing a parallel or kernels
3165region.
3166
3167@item @emph{C/C++}:
3168@multitable @columnfractions .20 .80
3169@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
3170@end multitable
3171
3172@item @emph{Fortran}:
3173@multitable @columnfractions .20 .80
3174@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
3175@item @tab @code{integer(kind=acc_device_kind) devicetype}
3176@item @tab @code{integer acc_get_device_num}
3177@end multitable
3178
3179@item @emph{Reference}:
3180@uref{https://www.openacc.org, OpenACC specification v2.6}, section
31813.2.5.
3182@end table
3183
3184
3185
3186@node acc_get_property
3187@section @code{acc_get_property} -- Get device property.
3188@cindex acc_get_property
3189@cindex acc_get_property_string
3190@table @asis
3191@item @emph{Description}
3192These routines return the value of the specified @var{property} for the
3193device being queried according to @var{devicenum} and @var{devicetype}.
3194Integer-valued and string-valued properties are returned by
3195@code{acc_get_property} and @code{acc_get_property_string} respectively.
3196The Fortran @code{acc_get_property_string} subroutine returns the string
3197retrieved in its fourth argument while the remaining entry points are
3198functions, which pass the return value as their result.
3199
3200Note for Fortran, only: the OpenACC technical committee corrected and, hence,
3201modified the interface introduced in OpenACC 2.6. The kind-value parameter
3202@code{acc_device_property} has been renamed to @code{acc_device_property_kind}
3203for consistency and the return type of the @code{acc_get_property} function is
3204now a @code{c_size_t} integer instead of a @code{acc_device_property} integer.
3205The parameter @code{acc_device_property} will continue to be provided,
3206but might be removed in a future version of GCC.
3207
3208@item @emph{C/C++}:
3209@multitable @columnfractions .20 .80
3210@item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
3211@item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
3212@end multitable
3213
3214@item @emph{Fortran}:
3215@multitable @columnfractions .20 .80
3216@item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)}
3217@item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)}
3218@item @tab @code{use ISO_C_Binding, only: c_size_t}
3219@item @tab @code{integer devicenum}
3220@item @tab @code{integer(kind=acc_device_kind) devicetype}
3221@item @tab @code{integer(kind=acc_device_property_kind) property}
3222@item @tab @code{integer(kind=c_size_t) acc_get_property}
3223@item @tab @code{character(*) string}
3224@end multitable
3225
3226@item @emph{Reference}:
3227@uref{https://www.openacc.org, OpenACC specification v2.6}, section
32283.2.6.
3229@end table
3230
3231
3232
3233@node acc_async_test
3234@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
3235@table @asis
3236@item @emph{Description}
3237This function tests for completion of the asynchronous operation specified
3238in @var{arg}. In C/C++, a non-zero value will be returned to indicate
3239the specified asynchronous operation has completed. While Fortran will return
3240a @code{true}. If the asynchronous operation has not completed, C/C++ returns
3241a zero and Fortran returns a @code{false}.
3242
3243@item @emph{C/C++}:
3244@multitable @columnfractions .20 .80
3245@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
3246@end multitable
3247
3248@item @emph{Fortran}:
3249@multitable @columnfractions .20 .80
3250@item @emph{Interface}: @tab @code{function acc_async_test(arg)}
3251@item @tab @code{integer(kind=acc_handle_kind) arg}
3252@item @tab @code{logical acc_async_test}
3253@end multitable
3254
3255@item @emph{Reference}:
3256@uref{https://www.openacc.org, OpenACC specification v2.6}, section
32573.2.9.
3258@end table
3259
3260
3261
3262@node acc_async_test_all
3263@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
3264@table @asis
3265@item @emph{Description}
3266This function tests for completion of all asynchronous operations.
3267In C/C++, a non-zero value will be returned to indicate all asynchronous
3268operations have completed. While Fortran will return a @code{true}. If
3269any asynchronous operation has not completed, C/C++ returns a zero and
3270Fortran returns a @code{false}.
3271
3272@item @emph{C/C++}:
3273@multitable @columnfractions .20 .80
3274@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
3275@end multitable
3276
3277@item @emph{Fortran}:
3278@multitable @columnfractions .20 .80
3279@item @emph{Interface}: @tab @code{function acc_async_test()}
3280@item @tab @code{logical acc_get_device_num}
3281@end multitable
3282
3283@item @emph{Reference}:
3284@uref{https://www.openacc.org, OpenACC specification v2.6}, section
32853.2.10.
3286@end table
3287
3288
3289
3290@node acc_wait
3291@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
3292@table @asis
3293@item @emph{Description}
3294This function waits for completion of the asynchronous operation
3295specified in @var{arg}.
3296
3297@item @emph{C/C++}:
3298@multitable @columnfractions .20 .80
3299@item @emph{Prototype}: @tab @code{acc_wait(arg);}
3300@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
3301@end multitable
3302
3303@item @emph{Fortran}:
3304@multitable @columnfractions .20 .80
3305@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
3306@item @tab @code{integer(acc_handle_kind) arg}
3307@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
3308@item @tab @code{integer(acc_handle_kind) arg}
3309@end multitable
3310
3311@item @emph{Reference}:
3312@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33133.2.11.
3314@end table
3315
3316
3317
3318@node acc_wait_all
3319@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
3320@table @asis
3321@item @emph{Description}
3322This function waits for the completion of all asynchronous operations.
3323
3324@item @emph{C/C++}:
3325@multitable @columnfractions .20 .80
3326@item @emph{Prototype}: @tab @code{acc_wait_all(void);}
3327@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
3328@end multitable
3329
3330@item @emph{Fortran}:
3331@multitable @columnfractions .20 .80
3332@item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
3333@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
3334@end multitable
3335
3336@item @emph{Reference}:
3337@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33383.2.13.
3339@end table
3340
3341
3342
3343@node acc_wait_all_async
3344@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
3345@table @asis
3346@item @emph{Description}
3347This function enqueues a wait operation on the queue @var{async} for any
3348and all asynchronous operations that have been previously enqueued on
3349any queue.
3350
3351@item @emph{C/C++}:
3352@multitable @columnfractions .20 .80
3353@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
3354@end multitable
3355
3356@item @emph{Fortran}:
3357@multitable @columnfractions .20 .80
3358@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
3359@item @tab @code{integer(acc_handle_kind) async}
3360@end multitable
3361
3362@item @emph{Reference}:
3363@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33643.2.14.
3365@end table
3366
3367
3368
3369@node acc_wait_async
3370@section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
3371@table @asis
3372@item @emph{Description}
3373This function enqueues a wait operation on queue @var{async} for any and all
3374asynchronous operations enqueued on queue @var{arg}.
3375
3376@item @emph{C/C++}:
3377@multitable @columnfractions .20 .80
3378@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
3379@end multitable
3380
3381@item @emph{Fortran}:
3382@multitable @columnfractions .20 .80
3383@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
3384@item @tab @code{integer(acc_handle_kind) arg, async}
3385@end multitable
3386
3387@item @emph{Reference}:
3388@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33893.2.12.
3390@end table
3391
3392
3393
3394@node acc_init
3395@section @code{acc_init} -- Initialize runtime for a specific device type.
3396@table @asis
3397@item @emph{Description}
3398This function initializes the runtime for the device type specified in
3399@var{devicetype}.
3400
3401@item @emph{C/C++}:
3402@multitable @columnfractions .20 .80
3403@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
3404@end multitable
3405
3406@item @emph{Fortran}:
3407@multitable @columnfractions .20 .80
3408@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
3409@item @tab @code{integer(acc_device_kind) devicetype}
3410@end multitable
3411
3412@item @emph{Reference}:
3413@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34143.2.7.
3415@end table
3416
3417
3418
3419@node acc_shutdown
3420@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
3421@table @asis
3422@item @emph{Description}
3423This function shuts down the runtime for the device type specified in
3424@var{devicetype}.
3425
3426@item @emph{C/C++}:
3427@multitable @columnfractions .20 .80
3428@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
3429@end multitable
3430
3431@item @emph{Fortran}:
3432@multitable @columnfractions .20 .80
3433@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
3434@item @tab @code{integer(acc_device_kind) devicetype}
3435@end multitable
3436
3437@item @emph{Reference}:
3438@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34393.2.8.
3440@end table
3441
3442
3443
3444@node acc_on_device
3445@section @code{acc_on_device} -- Whether executing on a particular device
3446@table @asis
3447@item @emph{Description}:
3448This function returns whether the program is executing on a particular
3449device specified in @var{devicetype}. In C/C++ a non-zero value is
3450returned to indicate the device is executing on the specified device type.
3451In Fortran, @code{true} will be returned. If the program is not executing
3452on the specified device type C/C++ will return a zero, while Fortran will
3453return @code{false}.
3454
3455@item @emph{C/C++}:
3456@multitable @columnfractions .20 .80
3457@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
3458@end multitable
3459
3460@item @emph{Fortran}:
3461@multitable @columnfractions .20 .80
3462@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
3463@item @tab @code{integer(acc_device_kind) devicetype}
3464@item @tab @code{logical acc_on_device}
3465@end multitable
3466
3467
3468@item @emph{Reference}:
3469@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34703.2.17.
3471@end table
3472
3473
3474
3475@node acc_malloc
3476@section @code{acc_malloc} -- Allocate device memory.
3477@table @asis
3478@item @emph{Description}
3479This function allocates @var{len} bytes of device memory. It returns
3480the device address of the allocated memory.
3481
3482@item @emph{C/C++}:
3483@multitable @columnfractions .20 .80
3484@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
3485@end multitable
3486
3487@item @emph{Reference}:
3488@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34893.2.18.
3490@end table
3491
3492
3493
3494@node acc_free
3495@section @code{acc_free} -- Free device memory.
3496@table @asis
3497@item @emph{Description}
3498Free previously allocated device memory at the device address @code{a}.
3499
3500@item @emph{C/C++}:
3501@multitable @columnfractions .20 .80
3502@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
3503@end multitable
3504
3505@item @emph{Reference}:
3506@uref{https://www.openacc.org, OpenACC specification v2.6}, section
35073.2.19.
3508@end table
3509
3510
3511
3512@node acc_copyin
3513@section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
3514@table @asis
3515@item @emph{Description}
3516In C/C++, this function allocates @var{len} bytes of device memory
3517and maps it to the specified host address in @var{a}. The device
3518address of the newly allocated device memory is returned.
3519
3520In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3521a contiguous array section. The second form @var{a} specifies a
3522variable or array element and @var{len} specifies the length in bytes.
3523
3524@item @emph{C/C++}:
3525@multitable @columnfractions .20 .80
3526@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
3527@item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);}
3528@end multitable
3529
3530@item @emph{Fortran}:
3531@multitable @columnfractions .20 .80
3532@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
3533@item @tab @code{type, dimension(:[,:]...) :: a}
3534@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
3535@item @tab @code{type, dimension(:[,:]...) :: a}
3536@item @tab @code{integer len}
3537@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)}
3538@item @tab @code{type, dimension(:[,:]...) :: a}
3539@item @tab @code{integer(acc_handle_kind) :: async}
3540@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)}
3541@item @tab @code{type, dimension(:[,:]...) :: a}
3542@item @tab @code{integer len}
3543@item @tab @code{integer(acc_handle_kind) :: async}
3544@end multitable
3545
3546@item @emph{Reference}:
3547@uref{https://www.openacc.org, OpenACC specification v2.6}, section
35483.2.20.
3549@end table
3550
3551
3552
3553@node acc_present_or_copyin
3554@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
3555@table @asis
3556@item @emph{Description}
3557This function tests if the host data specified by @var{a} and of length
3558@var{len} is present or not. If it is not present, then device memory
3559will be allocated and the host memory copied. The device address of
3560the newly allocated device memory is returned.
3561
3562In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3563a contiguous array section. The second form @var{a} specifies a variable or
3564array element and @var{len} specifies the length in bytes.
3565
3566Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for
3567backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead.
3568
3569@item @emph{C/C++}:
3570@multitable @columnfractions .20 .80
3571@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
3572@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
3573@end multitable
3574
3575@item @emph{Fortran}:
3576@multitable @columnfractions .20 .80
3577@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
3578@item @tab @code{type, dimension(:[,:]...) :: a}
3579@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
3580@item @tab @code{type, dimension(:[,:]...) :: a}
3581@item @tab @code{integer len}
3582@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
3583@item @tab @code{type, dimension(:[,:]...) :: a}
3584@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
3585@item @tab @code{type, dimension(:[,:]...) :: a}
3586@item @tab @code{integer len}
3587@end multitable
3588
3589@item @emph{Reference}:
3590@uref{https://www.openacc.org, OpenACC specification v2.6}, section
35913.2.20.
3592@end table
3593
3594
3595
3596@node acc_create
3597@section @code{acc_create} -- Allocate device memory and map it to host memory.
3598@table @asis
3599@item @emph{Description}
3600This function allocates device memory and maps it to host memory specified
3601by the host address @var{a} with a length of @var{len} bytes. In C/C++,
3602the function returns the device address of the allocated device memory.
3603
3604In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3605a contiguous array section. The second form @var{a} specifies a variable or
3606array element and @var{len} specifies the length in bytes.
3607
3608@item @emph{C/C++}:
3609@multitable @columnfractions .20 .80
3610@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
3611@item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);}
3612@end multitable
3613
3614@item @emph{Fortran}:
3615@multitable @columnfractions .20 .80
3616@item @emph{Interface}: @tab @code{subroutine acc_create(a)}
3617@item @tab @code{type, dimension(:[,:]...) :: a}
3618@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
3619@item @tab @code{type, dimension(:[,:]...) :: a}
3620@item @tab @code{integer len}
3621@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)}
3622@item @tab @code{type, dimension(:[,:]...) :: a}
3623@item @tab @code{integer(acc_handle_kind) :: async}
3624@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)}
3625@item @tab @code{type, dimension(:[,:]...) :: a}
3626@item @tab @code{integer len}
3627@item @tab @code{integer(acc_handle_kind) :: async}
3628@end multitable
3629
3630@item @emph{Reference}:
3631@uref{https://www.openacc.org, OpenACC specification v2.6}, section
36323.2.21.
3633@end table
3634
3635
3636
3637@node acc_present_or_create
3638@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
3639@table @asis
3640@item @emph{Description}
3641This function tests if the host data specified by @var{a} and of length
3642@var{len} is present or not. If it is not present, then device memory
3643will be allocated and mapped to host memory. In C/C++, the device address
3644of the newly allocated device memory is returned.
3645
3646In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3647a contiguous array section. The second form @var{a} specifies a variable or
3648array element and @var{len} specifies the length in bytes.
3649
3650Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for
3651backward compatibility with OpenACC 2.0; use @ref{acc_create} instead.
3652
3653@item @emph{C/C++}:
3654@multitable @columnfractions .20 .80
3655@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
3656@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
3657@end multitable
3658
3659@item @emph{Fortran}:
3660@multitable @columnfractions .20 .80
3661@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
3662@item @tab @code{type, dimension(:[,:]...) :: a}
3663@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
3664@item @tab @code{type, dimension(:[,:]...) :: a}
3665@item @tab @code{integer len}
3666@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
3667@item @tab @code{type, dimension(:[,:]...) :: a}
3668@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
3669@item @tab @code{type, dimension(:[,:]...) :: a}
3670@item @tab @code{integer len}
3671@end multitable
3672
3673@item @emph{Reference}:
3674@uref{https://www.openacc.org, OpenACC specification v2.6}, section
36753.2.21.
3676@end table
3677
3678
3679
3680@node acc_copyout
3681@section @code{acc_copyout} -- Copy device memory to host memory.
3682@table @asis
3683@item @emph{Description}
3684This function copies mapped device memory to host memory which is specified
3685by host address @var{a} for a length @var{len} bytes in C/C++.
3686
3687In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3688a contiguous array section. The second form @var{a} specifies a variable or
3689array element and @var{len} specifies the length in bytes.
3690
3691@item @emph{C/C++}:
3692@multitable @columnfractions .20 .80
3693@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
3694@item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);}
3695@item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);}
3696@item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);}
3697@end multitable
3698
3699@item @emph{Fortran}:
3700@multitable @columnfractions .20 .80
3701@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
3702@item @tab @code{type, dimension(:[,:]...) :: a}
3703@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
3704@item @tab @code{type, dimension(:[,:]...) :: a}
3705@item @tab @code{integer len}
3706@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)}
3707@item @tab @code{type, dimension(:[,:]...) :: a}
3708@item @tab @code{integer(acc_handle_kind) :: async}
3709@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)}
3710@item @tab @code{type, dimension(:[,:]...) :: a}
3711@item @tab @code{integer len}
3712@item @tab @code{integer(acc_handle_kind) :: async}
3713@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)}
3714@item @tab @code{type, dimension(:[,:]...) :: a}
3715@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)}
3716@item @tab @code{type, dimension(:[,:]...) :: a}
3717@item @tab @code{integer len}
3718@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)}
3719@item @tab @code{type, dimension(:[,:]...) :: a}
3720@item @tab @code{integer(acc_handle_kind) :: async}
3721@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)}
3722@item @tab @code{type, dimension(:[,:]...) :: a}
3723@item @tab @code{integer len}
3724@item @tab @code{integer(acc_handle_kind) :: async}
3725@end multitable
3726
3727@item @emph{Reference}:
3728@uref{https://www.openacc.org, OpenACC specification v2.6}, section
37293.2.22.
3730@end table
3731
3732
3733
3734@node acc_delete
3735@section @code{acc_delete} -- Free device memory.
3736@table @asis
3737@item @emph{Description}
3738This function frees previously allocated device memory specified by
3739the device address @var{a} and the length of @var{len} bytes.
3740
3741In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3742a contiguous array section. The second form @var{a} specifies a variable or
3743array element and @var{len} specifies the length in bytes.
3744
3745@item @emph{C/C++}:
3746@multitable @columnfractions .20 .80
3747@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
3748@item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);}
3749@item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);}
3750@item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);}
3751@end multitable
3752
3753@item @emph{Fortran}:
3754@multitable @columnfractions .20 .80
3755@item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
3756@item @tab @code{type, dimension(:[,:]...) :: a}
3757@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
3758@item @tab @code{type, dimension(:[,:]...) :: a}
3759@item @tab @code{integer len}
3760@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)}
3761@item @tab @code{type, dimension(:[,:]...) :: a}
3762@item @tab @code{integer(acc_handle_kind) :: async}
3763@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)}
3764@item @tab @code{type, dimension(:[,:]...) :: a}
3765@item @tab @code{integer len}
3766@item @tab @code{integer(acc_handle_kind) :: async}
3767@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)}
3768@item @tab @code{type, dimension(:[,:]...) :: a}
3769@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)}
3770@item @tab @code{type, dimension(:[,:]...) :: a}
3771@item @tab @code{integer len}
3772@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)}
3773@item @tab @code{type, dimension(:[,:]...) :: a}
3774@item @tab @code{integer(acc_handle_kind) :: async}
3775@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)}
3776@item @tab @code{type, dimension(:[,:]...) :: a}
3777@item @tab @code{integer len}
3778@item @tab @code{integer(acc_handle_kind) :: async}
3779@end multitable
3780
3781@item @emph{Reference}:
3782@uref{https://www.openacc.org, OpenACC specification v2.6}, section
37833.2.23.
3784@end table
3785
3786
3787
3788@node acc_update_device
3789@section @code{acc_update_device} -- Update device memory from mapped host memory.
3790@table @asis
3791@item @emph{Description}
3792This function updates the device copy from the previously mapped host memory.
3793The host memory is specified with the host address @var{a} and a length of
3794@var{len} bytes.
3795
3796In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3797a contiguous array section. The second form @var{a} specifies a variable or
3798array element and @var{len} specifies the length in bytes.
3799
3800@item @emph{C/C++}:
3801@multitable @columnfractions .20 .80
3802@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
3803@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);}
3804@end multitable
3805
3806@item @emph{Fortran}:
3807@multitable @columnfractions .20 .80
3808@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
3809@item @tab @code{type, dimension(:[,:]...) :: a}
3810@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
3811@item @tab @code{type, dimension(:[,:]...) :: a}
3812@item @tab @code{integer len}
3813@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)}
3814@item @tab @code{type, dimension(:[,:]...) :: a}
3815@item @tab @code{integer(acc_handle_kind) :: async}
3816@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)}
3817@item @tab @code{type, dimension(:[,:]...) :: a}
3818@item @tab @code{integer len}
3819@item @tab @code{integer(acc_handle_kind) :: async}
3820@end multitable
3821
3822@item @emph{Reference}:
3823@uref{https://www.openacc.org, OpenACC specification v2.6}, section
38243.2.24.
3825@end table
3826
3827
3828
3829@node acc_update_self
3830@section @code{acc_update_self} -- Update host memory from mapped device memory.
3831@table @asis
3832@item @emph{Description}
3833This function updates the host copy from the previously mapped device memory.
3834The host memory is specified with the host address @var{a} and a length of
3835@var{len} bytes.
3836
3837In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3838a contiguous array section. The second form @var{a} specifies a variable or
3839array element and @var{len} specifies the length in bytes.
3840
3841@item @emph{C/C++}:
3842@multitable @columnfractions .20 .80
3843@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
3844@item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);}
3845@end multitable
3846
3847@item @emph{Fortran}:
3848@multitable @columnfractions .20 .80
3849@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
3850@item @tab @code{type, dimension(:[,:]...) :: a}
3851@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
3852@item @tab @code{type, dimension(:[,:]...) :: a}
3853@item @tab @code{integer len}
3854@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)}
3855@item @tab @code{type, dimension(:[,:]...) :: a}
3856@item @tab @code{integer(acc_handle_kind) :: async}
3857@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)}
3858@item @tab @code{type, dimension(:[,:]...) :: a}
3859@item @tab @code{integer len}
3860@item @tab @code{integer(acc_handle_kind) :: async}
3861@end multitable
3862
3863@item @emph{Reference}:
3864@uref{https://www.openacc.org, OpenACC specification v2.6}, section
38653.2.25.
3866@end table
3867
3868
3869
3870@node acc_map_data
3871@section @code{acc_map_data} -- Map previously allocated device memory to host memory.
3872@table @asis
3873@item @emph{Description}
3874This function maps previously allocated device and host memory. The device
3875memory is specified with the device address @var{d}. The host memory is
3876specified with the host address @var{h} and a length of @var{len}.
3877
3878@item @emph{C/C++}:
3879@multitable @columnfractions .20 .80
3880@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
3881@end multitable
3882
3883@item @emph{Reference}:
3884@uref{https://www.openacc.org, OpenACC specification v2.6}, section
38853.2.26.
3886@end table
3887
3888
3889
3890@node acc_unmap_data
3891@section @code{acc_unmap_data} -- Unmap device memory from host memory.
3892@table @asis
3893@item @emph{Description}
3894This function unmaps previously mapped device and host memory. The latter
3895specified by @var{h}.
3896
3897@item @emph{C/C++}:
3898@multitable @columnfractions .20 .80
3899@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
3900@end multitable
3901
3902@item @emph{Reference}:
3903@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39043.2.27.
3905@end table
3906
3907
3908
3909@node acc_deviceptr
3910@section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
3911@table @asis
3912@item @emph{Description}
3913This function returns the device address that has been mapped to the
3914host address specified by @var{h}.
3915
3916@item @emph{C/C++}:
3917@multitable @columnfractions .20 .80
3918@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
3919@end multitable
3920
3921@item @emph{Reference}:
3922@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39233.2.28.
3924@end table
3925
3926
3927
3928@node acc_hostptr
3929@section @code{acc_hostptr} -- Get host pointer associated with specific device address.
3930@table @asis
3931@item @emph{Description}
3932This function returns the host address that has been mapped to the
3933device address specified by @var{d}.
3934
3935@item @emph{C/C++}:
3936@multitable @columnfractions .20 .80
3937@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
3938@end multitable
3939
3940@item @emph{Reference}:
3941@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39423.2.29.
3943@end table
3944
3945
3946
3947@node acc_is_present
3948@section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
3949@table @asis
3950@item @emph{Description}
3951This function indicates whether the specified host address in @var{a} and a
3952length of @var{len} bytes is present on the device. In C/C++, a non-zero
3953value is returned to indicate the presence of the mapped memory on the
3954device. A zero is returned to indicate the memory is not mapped on the
3955device.
3956
3957In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3958a contiguous array section. The second form @var{a} specifies a variable or
3959array element and @var{len} specifies the length in bytes. If the host
3960memory is mapped to device memory, then a @code{true} is returned. Otherwise,
3961a @code{false} is return to indicate the mapped memory is not present.
3962
3963@item @emph{C/C++}:
3964@multitable @columnfractions .20 .80
3965@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
3966@end multitable
3967
3968@item @emph{Fortran}:
3969@multitable @columnfractions .20 .80
3970@item @emph{Interface}: @tab @code{function acc_is_present(a)}
3971@item @tab @code{type, dimension(:[,:]...) :: a}
3972@item @tab @code{logical acc_is_present}
3973@item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
3974@item @tab @code{type, dimension(:[,:]...) :: a}
3975@item @tab @code{integer len}
3976@item @tab @code{logical acc_is_present}
3977@end multitable
3978
3979@item @emph{Reference}:
3980@uref{https://www.openacc.org, OpenACC specification v2.6}, section
39813.2.30.
3982@end table
3983
3984
3985
3986@node acc_memcpy_to_device
3987@section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
3988@table @asis
3989@item @emph{Description}
3990This function copies host memory specified by host address of @var{src} to
3991device memory specified by the device address @var{dest} for a length of
3992@var{bytes} bytes.
3993
3994@item @emph{C/C++}:
3995@multitable @columnfractions .20 .80
3996@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
3997@end multitable
3998
3999@item @emph{Reference}:
4000@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40013.2.31.
4002@end table
4003
4004
4005
4006@node acc_memcpy_from_device
4007@section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
4008@table @asis
4009@item @emph{Description}
4010This function copies host memory specified by host address of @var{src} from
4011device memory specified by the device address @var{dest} for a length of
4012@var{bytes} bytes.
4013
4014@item @emph{C/C++}:
4015@multitable @columnfractions .20 .80
4016@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
4017@end multitable
4018
4019@item @emph{Reference}:
4020@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40213.2.32.
4022@end table
4023
4024
4025
4026@node acc_attach
4027@section @code{acc_attach} -- Let device pointer point to device-pointer target.
4028@table @asis
4029@item @emph{Description}
4030This function updates a pointer on the device from pointing to a host-pointer
4031address to pointing to the corresponding device data.
4032
4033@item @emph{C/C++}:
4034@multitable @columnfractions .20 .80
4035@item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);}
4036@item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);}
4037@end multitable
4038
4039@item @emph{Reference}:
4040@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40413.2.34.
4042@end table
4043
4044
4045
4046@node acc_detach
4047@section @code{acc_detach} -- Let device pointer point to host-pointer target.
4048@table @asis
4049@item @emph{Description}
4050This function updates a pointer on the device from pointing to a device-pointer
4051address to pointing to the corresponding host data.
4052
4053@item @emph{C/C++}:
4054@multitable @columnfractions .20 .80
4055@item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);}
4056@item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);}
4057@item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);}
4058@item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);}
4059@end multitable
4060
4061@item @emph{Reference}:
4062@uref{https://www.openacc.org, OpenACC specification v2.6}, section
40633.2.35.
4064@end table
4065
4066
4067
4068@node acc_get_current_cuda_device
4069@section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
4070@table @asis
4071@item @emph{Description}
4072This function returns the CUDA device handle. This handle is the same
4073as used by the CUDA Runtime or Driver API's.
4074
4075@item @emph{C/C++}:
4076@multitable @columnfractions .20 .80
4077@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
4078@end multitable
4079
4080@item @emph{Reference}:
4081@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4082A.2.1.1.
4083@end table
4084
4085
4086
4087@node acc_get_current_cuda_context
4088@section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
4089@table @asis
4090@item @emph{Description}
4091This function returns the CUDA context handle. This handle is the same
4092as used by the CUDA Runtime or Driver API's.
4093
4094@item @emph{C/C++}:
4095@multitable @columnfractions .20 .80
4096@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
4097@end multitable
4098
4099@item @emph{Reference}:
4100@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4101A.2.1.2.
4102@end table
4103
4104
4105
4106@node acc_get_cuda_stream
4107@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
4108@table @asis
4109@item @emph{Description}
4110This function returns the CUDA stream handle for the queue @var{async}.
4111This handle is the same as used by the CUDA Runtime or Driver API's.
4112
4113@item @emph{C/C++}:
4114@multitable @columnfractions .20 .80
4115@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
4116@end multitable
4117
4118@item @emph{Reference}:
4119@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4120A.2.1.3.
4121@end table
4122
4123
4124
4125@node acc_set_cuda_stream
4126@section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
4127@table @asis
4128@item @emph{Description}
4129This function associates the stream handle specified by @var{stream} with
4130the queue @var{async}.
4131
4132This cannot be used to change the stream handle associated with
4133@code{acc_async_sync}.
4134
4135The return value is not specified.
4136
4137@item @emph{C/C++}:
4138@multitable @columnfractions .20 .80
4139@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
4140@end multitable
4141
4142@item @emph{Reference}:
4143@uref{https://www.openacc.org, OpenACC specification v2.6}, section
4144A.2.1.4.
4145@end table
4146
4147
4148
4149@node acc_prof_register
4150@section @code{acc_prof_register} -- Register callbacks.
4151@table @asis
4152@item @emph{Description}:
4153This function registers callbacks.
4154
4155@item @emph{C/C++}:
4156@multitable @columnfractions .20 .80
4157@item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
4158@end multitable
4159
4160@item @emph{See also}:
4161@ref{OpenACC Profiling Interface}
4162
4163@item @emph{Reference}:
4164@uref{https://www.openacc.org, OpenACC specification v2.6}, section
41655.3.
4166@end table
4167
4168
4169
4170@node acc_prof_unregister
4171@section @code{acc_prof_unregister} -- Unregister callbacks.
4172@table @asis
4173@item @emph{Description}:
4174This function unregisters callbacks.
4175
4176@item @emph{C/C++}:
4177@multitable @columnfractions .20 .80
4178@item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
4179@end multitable
4180
4181@item @emph{See also}:
4182@ref{OpenACC Profiling Interface}
4183
4184@item @emph{Reference}:
4185@uref{https://www.openacc.org, OpenACC specification v2.6}, section
41865.3.
4187@end table
4188
4189
4190
4191@node acc_prof_lookup
4192@section @code{acc_prof_lookup} -- Obtain inquiry functions.
4193@table @asis
4194@item @emph{Description}:
4195Function to obtain inquiry functions.
4196
4197@item @emph{C/C++}:
4198@multitable @columnfractions .20 .80
4199@item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
4200@end multitable
4201
4202@item @emph{See also}:
4203@ref{OpenACC Profiling Interface}
4204
4205@item @emph{Reference}:
4206@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42075.3.
4208@end table
4209
4210
4211
4212@node acc_register_library
4213@section @code{acc_register_library} -- Library registration.
4214@table @asis
4215@item @emph{Description}:
4216Function for library registration.
4217
4218@item @emph{C/C++}:
4219@multitable @columnfractions .20 .80
4220@item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
4221@end multitable
4222
4223@item @emph{See also}:
4224@ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
4225
4226@item @emph{Reference}:
4227@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42285.3.
4229@end table
4230
4231
4232
4233@c ---------------------------------------------------------------------
4234@c OpenACC Environment Variables
4235@c ---------------------------------------------------------------------
4236
4237@node OpenACC Environment Variables
4238@chapter OpenACC Environment Variables
4239
4240The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
4241are defined by section 4 of the OpenACC specification in version 2.0.
4242The variable @env{ACC_PROFLIB}
4243is defined by section 4 of the OpenACC specification in version 2.6.
4244The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
4245
4246@menu
4247* ACC_DEVICE_TYPE::
4248* ACC_DEVICE_NUM::
4249* ACC_PROFLIB::
4250* GCC_ACC_NOTIFY::
4251@end menu
4252
4253
4254
4255@node ACC_DEVICE_TYPE
4256@section @code{ACC_DEVICE_TYPE}
4257@table @asis
4258@item @emph{Reference}:
4259@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42604.1.
4261@end table
4262
4263
4264
4265@node ACC_DEVICE_NUM
4266@section @code{ACC_DEVICE_NUM}
4267@table @asis
4268@item @emph{Reference}:
4269@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42704.2.
4271@end table
4272
4273
4274
4275@node ACC_PROFLIB
4276@section @code{ACC_PROFLIB}
4277@table @asis
4278@item @emph{See also}:
4279@ref{acc_register_library}, @ref{OpenACC Profiling Interface}
4280
4281@item @emph{Reference}:
4282@uref{https://www.openacc.org, OpenACC specification v2.6}, section
42834.3.
4284@end table
4285
4286
4287
4288@node GCC_ACC_NOTIFY
4289@section @code{GCC_ACC_NOTIFY}
4290@table @asis
4291@item @emph{Description}:
4292Print debug information pertaining to the accelerator.
4293@end table
4294
4295
4296
4297@c ---------------------------------------------------------------------
4298@c CUDA Streams Usage
4299@c ---------------------------------------------------------------------
4300
4301@node CUDA Streams Usage
4302@chapter CUDA Streams Usage
4303
4304This applies to the @code{nvptx} plugin only.
4305
4306The library provides elements that perform asynchronous movement of
4307data and asynchronous operation of computing constructs. This
4308asynchronous functionality is implemented by making use of CUDA
4309streams@footnote{See "Stream Management" in "CUDA Driver API",
4310TRM-06703-001, Version 5.5, for additional information}.
4311
4312The primary means by that the asynchronous functionality is accessed
4313is through the use of those OpenACC directives which make use of the
4314@code{async} and @code{wait} clauses. When the @code{async} clause is
4315first used with a directive, it creates a CUDA stream. If an
4316@code{async-argument} is used with the @code{async} clause, then the
4317stream is associated with the specified @code{async-argument}.
4318
4319Following the creation of an association between a CUDA stream and the
4320@code{async-argument} of an @code{async} clause, both the @code{wait}
4321clause and the @code{wait} directive can be used. When either the
4322clause or directive is used after stream creation, it creates a
4323rendezvous point whereby execution waits until all operations
4324associated with the @code{async-argument}, that is, stream, have
4325completed.
4326
4327Normally, the management of the streams that are created as a result of
4328using the @code{async} clause, is done without any intervention by the
4329caller. This implies the association between the @code{async-argument}
4330and the CUDA stream will be maintained for the lifetime of the program.
4331However, this association can be changed through the use of the library
4332function @code{acc_set_cuda_stream}. When the function
4333@code{acc_set_cuda_stream} is called, the CUDA stream that was
4334originally associated with the @code{async} clause will be destroyed.
4335Caution should be taken when changing the association as subsequent
4336references to the @code{async-argument} refer to a different
4337CUDA stream.
4338
4339
4340
4341@c ---------------------------------------------------------------------
4342@c OpenACC Library Interoperability
4343@c ---------------------------------------------------------------------
4344
4345@node OpenACC Library Interoperability
4346@chapter OpenACC Library Interoperability
4347
4348@section Introduction
4349
4350The OpenACC library uses the CUDA Driver API, and may interact with
4351programs that use the Runtime library directly, or another library
4352based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
4353"Interactions with the CUDA Driver API" in
4354"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
4355Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
4356for additional information on library interoperability.}.
4357This chapter describes the use cases and what changes are
4358required in order to use both the OpenACC library and the CUBLAS and Runtime
4359libraries within a program.
4360
4361@section First invocation: NVIDIA CUBLAS library API
4362
4363In this first use case (see below), a function in the CUBLAS library is called
4364prior to any of the functions in the OpenACC library. More specifically, the
4365function @code{cublasCreate()}.
4366
4367When invoked, the function initializes the library and allocates the
4368hardware resources on the host and the device on behalf of the caller. Once
4369the initialization and allocation has completed, a handle is returned to the
4370caller. The OpenACC library also requires initialization and allocation of
4371hardware resources. Since the CUBLAS library has already allocated the
4372hardware resources for the device, all that is left to do is to initialize
4373the OpenACC library and acquire the hardware resources on the host.
4374
4375Prior to calling the OpenACC function that initializes the library and
4376allocate the host hardware resources, you need to acquire the device number
4377that was allocated during the call to @code{cublasCreate()}. The invoking of the
4378runtime library function @code{cudaGetDevice()} accomplishes this. Once
4379acquired, the device number is passed along with the device type as
4380parameters to the OpenACC library function @code{acc_set_device_num()}.
4381
4382Once the call to @code{acc_set_device_num()} has completed, the OpenACC
4383library uses the context that was created during the call to
4384@code{cublasCreate()}. In other words, both libraries will be sharing the
4385same context.
4386
4387@smallexample
4388 /* Create the handle */
4389 s = cublasCreate(&h);
4390 if (s != CUBLAS_STATUS_SUCCESS)
4391 @{
4392 fprintf(stderr, "cublasCreate failed %d\n", s);
4393 exit(EXIT_FAILURE);
4394 @}
4395
4396 /* Get the device number */
4397 e = cudaGetDevice(&dev);
4398 if (e != cudaSuccess)
4399 @{
4400 fprintf(stderr, "cudaGetDevice failed %d\n", e);
4401 exit(EXIT_FAILURE);
4402 @}
4403
4404 /* Initialize OpenACC library and use device 'dev' */
4405 acc_set_device_num(dev, acc_device_nvidia);
4406
4407@end smallexample
4408@center Use Case 1
4409
4410@section First invocation: OpenACC library API
4411
4412In this second use case (see below), a function in the OpenACC library is
eda38850 4413called prior to any of the functions in the CUBLAS library. More specifically,
d77de738
ML
4414the function @code{acc_set_device_num()}.
4415
4416In the use case presented here, the function @code{acc_set_device_num()}
4417is used to both initialize the OpenACC library and allocate the hardware
4418resources on the host and the device. In the call to the function, the
4419call parameters specify which device to use and what device
4420type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
4421is but one method to initialize the OpenACC library and allocate the
4422appropriate hardware resources. Other methods are available through the
4423use of environment variables and these will be discussed in the next section.
4424
4425Once the call to @code{acc_set_device_num()} has completed, other OpenACC
4426functions can be called as seen with multiple calls being made to
4427@code{acc_copyin()}. In addition, calls can be made to functions in the
4428CUBLAS library. In the use case a call to @code{cublasCreate()} is made
4429subsequent to the calls to @code{acc_copyin()}.
4430As seen in the previous use case, a call to @code{cublasCreate()}
4431initializes the CUBLAS library and allocates the hardware resources on the
4432host and the device. However, since the device has already been allocated,
4433@code{cublasCreate()} will only initialize the CUBLAS library and allocate
4434the appropriate hardware resources on the host. The context that was created
4435as part of the OpenACC initialization is shared with the CUBLAS library,
4436similarly to the first use case.
4437
4438@smallexample
4439 dev = 0;
4440
4441 acc_set_device_num(dev, acc_device_nvidia);
4442
4443 /* Copy the first set to the device */
4444 d_X = acc_copyin(&h_X[0], N * sizeof (float));
4445 if (d_X == NULL)
4446 @{
4447 fprintf(stderr, "copyin error h_X\n");
4448 exit(EXIT_FAILURE);
4449 @}
4450
4451 /* Copy the second set to the device */
4452 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
4453 if (d_Y == NULL)
4454 @{
4455 fprintf(stderr, "copyin error h_Y1\n");
4456 exit(EXIT_FAILURE);
4457 @}
4458
4459 /* Create the handle */
4460 s = cublasCreate(&h);
4461 if (s != CUBLAS_STATUS_SUCCESS)
4462 @{
4463 fprintf(stderr, "cublasCreate failed %d\n", s);
4464 exit(EXIT_FAILURE);
4465 @}
4466
4467 /* Perform saxpy using CUBLAS library function */
4468 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
4469 if (s != CUBLAS_STATUS_SUCCESS)
4470 @{
4471 fprintf(stderr, "cublasSaxpy failed %d\n", s);
4472 exit(EXIT_FAILURE);
4473 @}
4474
4475 /* Copy the results from the device */
4476 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
4477
4478@end smallexample
4479@center Use Case 2
4480
4481@section OpenACC library and environment variables
4482
4483There are two environment variables associated with the OpenACC library
4484that may be used to control the device type and device number:
4485@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two
4486environment variables can be used as an alternative to calling
4487@code{acc_set_device_num()}. As seen in the second use case, the device
4488type and device number were specified using @code{acc_set_device_num()}.
4489If however, the aforementioned environment variables were set, then the
4490call to @code{acc_set_device_num()} would not be required.
4491
4492
4493The use of the environment variables is only relevant when an OpenACC function
4494is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
4495is called prior to a call to an OpenACC function, then you must call
4496@code{acc_set_device_num()}@footnote{More complete information
4497about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
4498sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
4499Application Programming Interface”, Version 2.6.}
4500
4501
4502
4503@c ---------------------------------------------------------------------
4504@c OpenACC Profiling Interface
4505@c ---------------------------------------------------------------------
4506
4507@node OpenACC Profiling Interface
4508@chapter OpenACC Profiling Interface
4509
4510@section Implementation Status and Implementation-Defined Behavior
4511
4512We're implementing the OpenACC Profiling Interface as defined by the
4513OpenACC 2.6 specification. We're clarifying some aspects here as
4514@emph{implementation-defined behavior}, while they're still under
4515discussion within the OpenACC Technical Committee.
4516
4517This implementation is tuned to keep the performance impact as low as
4518possible for the (very common) case that the Profiling Interface is
4519not enabled. This is relevant, as the Profiling Interface affects all
4520the @emph{hot} code paths (in the target code, not in the offloaded
4521code). Users of the OpenACC Profiling Interface can be expected to
4522understand that performance will be impacted to some degree once the
4523Profiling Interface has gotten enabled: for example, because of the
4524@emph{runtime} (libgomp) calling into a third-party @emph{library} for
4525every event that has been registered.
4526
4527We're not yet accounting for the fact that @cite{OpenACC events may
4528occur during event processing}.
4529We just handle one case specially, as required by CUDA 9.0
4530@command{nvprof}, that @code{acc_get_device_type}
4531(@ref{acc_get_device_type})) may be called from
4532@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
4533callbacks.
4534
4535We're not yet implementing initialization via a
4536@code{acc_register_library} function that is either statically linked
4537in, or dynamically via @env{LD_PRELOAD}.
4538Initialization via @code{acc_register_library} functions dynamically
4539loaded via the @env{ACC_PROFLIB} environment variable does work, as
4540does directly calling @code{acc_prof_register},
4541@code{acc_prof_unregister}, @code{acc_prof_lookup}.
4542
4543As currently there are no inquiry functions defined, calls to
4544@code{acc_prof_lookup} will always return @code{NULL}.
4545
4546There aren't separate @emph{start}, @emph{stop} events defined for the
4547event types @code{acc_ev_create}, @code{acc_ev_delete},
4548@code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these
4549should be triggered before or after the actual device-specific call is
4550made. We trigger them after.
4551
4552Remarks about data provided to callbacks:
4553
4554@table @asis
4555
4556@item @code{acc_prof_info.event_type}
4557It's not clear if for @emph{nested} event callbacks (for example,
4558@code{acc_ev_enqueue_launch_start} as part of a parent compute
4559construct), this should be set for the nested event
4560(@code{acc_ev_enqueue_launch_start}), or if the value of the parent
4561construct should remain (@code{acc_ev_compute_construct_start}). In
4562this implementation, the value will generally correspond to the
4563innermost nested event type.
4564
4565@item @code{acc_prof_info.device_type}
4566@itemize
4567
4568@item
4569For @code{acc_ev_compute_construct_start}, and in presence of an
4570@code{if} clause with @emph{false} argument, this will still refer to
4571the offloading device type.
4572It's not clear if that's the expected behavior.
4573
4574@item
4575Complementary to the item before, for
4576@code{acc_ev_compute_construct_end}, this is set to
4577@code{acc_device_host} in presence of an @code{if} clause with
4578@emph{false} argument.
4579It's not clear if that's the expected behavior.
4580
4581@end itemize
4582
4583@item @code{acc_prof_info.thread_id}
4584Always @code{-1}; not yet implemented.
4585
4586@item @code{acc_prof_info.async}
4587@itemize
4588
4589@item
4590Not yet implemented correctly for
4591@code{acc_ev_compute_construct_start}.
4592
4593@item
4594In a compute construct, for host-fallback
4595execution/@code{acc_device_host} it will always be
4596@code{acc_async_sync}.
4597It's not clear if that's the expected behavior.
4598
4599@item
4600For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
4601it will always be @code{acc_async_sync}.
4602It's not clear if that's the expected behavior.
4603
4604@end itemize
4605
4606@item @code{acc_prof_info.async_queue}
4607There is no @cite{limited number of asynchronous queues} in libgomp.
4608This will always have the same value as @code{acc_prof_info.async}.
4609
4610@item @code{acc_prof_info.src_file}
4611Always @code{NULL}; not yet implemented.
4612
4613@item @code{acc_prof_info.func_name}
4614Always @code{NULL}; not yet implemented.
4615
4616@item @code{acc_prof_info.line_no}
4617Always @code{-1}; not yet implemented.
4618
4619@item @code{acc_prof_info.end_line_no}
4620Always @code{-1}; not yet implemented.
4621
4622@item @code{acc_prof_info.func_line_no}
4623Always @code{-1}; not yet implemented.
4624
4625@item @code{acc_prof_info.func_end_line_no}
4626Always @code{-1}; not yet implemented.
4627
4628@item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
4629Relating to @code{acc_prof_info.event_type} discussed above, in this
4630implementation, this will always be the same value as
4631@code{acc_prof_info.event_type}.
4632
4633@item @code{acc_event_info.*.parent_construct}
4634@itemize
4635
4636@item
4637Will be @code{acc_construct_parallel} for all OpenACC compute
4638constructs as well as many OpenACC Runtime API calls; should be the
4639one matching the actual construct, or
4640@code{acc_construct_runtime_api}, respectively.
4641
4642@item
4643Will be @code{acc_construct_enter_data} or
4644@code{acc_construct_exit_data} when processing variable mappings
4645specified in OpenACC @emph{declare} directives; should be
4646@code{acc_construct_declare}.
4647
4648@item
4649For implicit @code{acc_ev_device_init_start},
4650@code{acc_ev_device_init_end}, and explicit as well as implicit
4651@code{acc_ev_alloc}, @code{acc_ev_free},
4652@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
4653@code{acc_ev_enqueue_download_start}, and
4654@code{acc_ev_enqueue_download_end}, will be
4655@code{acc_construct_parallel}; should reflect the real parent
4656construct.
4657
4658@end itemize
4659
4660@item @code{acc_event_info.*.implicit}
4661For @code{acc_ev_alloc}, @code{acc_ev_free},
4662@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
4663@code{acc_ev_enqueue_download_start}, and
4664@code{acc_ev_enqueue_download_end}, this currently will be @code{1}
4665also for explicit usage.
4666
4667@item @code{acc_event_info.data_event.var_name}
4668Always @code{NULL}; not yet implemented.
4669
4670@item @code{acc_event_info.data_event.host_ptr}
4671For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
4672@code{NULL}.
4673
4674@item @code{typedef union acc_api_info}
4675@dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
4676Information}. This should obviously be @code{typedef @emph{struct}
4677acc_api_info}.
4678
4679@item @code{acc_api_info.device_api}
4680Possibly not yet implemented correctly for
4681@code{acc_ev_compute_construct_start},
4682@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
4683will always be @code{acc_device_api_none} for these event types.
4684For @code{acc_ev_enter_data_start}, it will be
4685@code{acc_device_api_none} in some cases.
4686
4687@item @code{acc_api_info.device_type}
4688Always the same as @code{acc_prof_info.device_type}.
4689
4690@item @code{acc_api_info.vendor}
4691Always @code{-1}; not yet implemented.
4692
4693@item @code{acc_api_info.device_handle}
4694Always @code{NULL}; not yet implemented.
4695
4696@item @code{acc_api_info.context_handle}
4697Always @code{NULL}; not yet implemented.
4698
4699@item @code{acc_api_info.async_handle}
4700Always @code{NULL}; not yet implemented.
4701
4702@end table
4703
4704Remarks about certain event types:
4705
4706@table @asis
4707
4708@item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
4709@itemize
4710
4711@item
4712@c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
4713@c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
4714@c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
4715When a compute construct triggers implicit
4716@code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
4717events, they currently aren't @emph{nested within} the corresponding
4718@code{acc_ev_compute_construct_start} and
4719@code{acc_ev_compute_construct_end}, but they're currently observed
4720@emph{before} @code{acc_ev_compute_construct_start}.
4721It's not clear what to do: the standard asks us provide a lot of
4722details to the @code{acc_ev_compute_construct_start} callback, without
4723(implicitly) initializing a device before?
4724
4725@item
4726Callbacks for these event types will not be invoked for calls to the
4727@code{acc_set_device_type} and @code{acc_set_device_num} functions.
4728It's not clear if they should be.
4729
4730@end itemize
4731
4732@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
4733@itemize
4734
4735@item
4736Callbacks for these event types will also be invoked for OpenACC
4737@emph{host_data} constructs.
4738It's not clear if they should be.
4739
4740@item
4741Callbacks for these event types will also be invoked when processing
4742variable mappings specified in OpenACC @emph{declare} directives.
4743It's not clear if they should be.
4744
4745@end itemize
4746
4747@end table
4748
4749Callbacks for the following event types will be invoked, but dispatch
4750and information provided therein has not yet been thoroughly reviewed:
4751
4752@itemize
4753@item @code{acc_ev_alloc}
4754@item @code{acc_ev_free}
4755@item @code{acc_ev_update_start}, @code{acc_ev_update_end}
4756@item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
4757@item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
4758@end itemize
4759
4760During device initialization, and finalization, respectively,
4761callbacks for the following event types will not yet be invoked:
4762
4763@itemize
4764@item @code{acc_ev_alloc}
4765@item @code{acc_ev_free}
4766@end itemize
4767
4768Callbacks for the following event types have not yet been implemented,
4769so currently won't be invoked:
4770
4771@itemize
4772@item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
4773@item @code{acc_ev_runtime_shutdown}
4774@item @code{acc_ev_create}, @code{acc_ev_delete}
4775@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
4776@end itemize
4777
4778For the following runtime library functions, not all expected
4779callbacks will be invoked (mostly concerning implicit device
4780initialization):
4781
4782@itemize
4783@item @code{acc_get_num_devices}
4784@item @code{acc_set_device_type}
4785@item @code{acc_get_device_type}
4786@item @code{acc_set_device_num}
4787@item @code{acc_get_device_num}
4788@item @code{acc_init}
4789@item @code{acc_shutdown}
4790@end itemize
4791
4792Aside from implicit device initialization, for the following runtime
4793library functions, no callbacks will be invoked for shared-memory
4794offloading devices (it's not clear if they should be):
4795
4796@itemize
4797@item @code{acc_malloc}
4798@item @code{acc_free}
4799@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
4800@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
4801@item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
4802@item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
4803@item @code{acc_update_device}, @code{acc_update_device_async}
4804@item @code{acc_update_self}, @code{acc_update_self_async}
4805@item @code{acc_map_data}, @code{acc_unmap_data}
4806@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
4807@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
4808@end itemize
4809
4810@c ---------------------------------------------------------------------
4811@c OpenMP-Implementation Specifics
4812@c ---------------------------------------------------------------------
4813
4814@node OpenMP-Implementation Specifics
4815@chapter OpenMP-Implementation Specifics
4816
4817@menu
2cd0689a 4818* Implementation-defined ICV Initialization::
d77de738 4819* OpenMP Context Selectors::
450b05ce 4820* Memory allocation::
d77de738
ML
4821@end menu
4822
2cd0689a
TB
4823@node Implementation-defined ICV Initialization
4824@section Implementation-defined ICV Initialization
4825@cindex Implementation specific setting
4826
4827@multitable @columnfractions .30 .70
4828@item @var{affinity-format-var} @tab See @ref{OMP_AFFINITY_FORMAT}.
4829@item @var{def-allocator-var} @tab See @ref{OMP_ALLOCATOR}.
4830@item @var{max-active-levels-var} @tab See @ref{OMP_MAX_ACTIVE_LEVELS}.
4831@item @var{dyn-var} @tab See @ref{OMP_DYNAMIC}.
819f3d36 4832@item @var{nthreads-var} @tab See @ref{OMP_NUM_THREADS}.
2cd0689a
TB
4833@item @var{num-devices-var} @tab Number of non-host devices found
4834by GCC's run-time library
4835@item @var{num-procs-var} @tab The number of CPU cores on the
4836initial device, except that affinity settings might lead to a
4837smaller number. On non-host devices, the value of the
4838@var{nthreads-var} ICV.
4839@item @var{place-partition-var} @tab See @ref{OMP_PLACES}.
4840@item @var{run-sched-var} @tab See @ref{OMP_SCHEDULE}.
4841@item @var{stacksize-var} @tab See @ref{OMP_STACKSIZE}.
4842@item @var{thread-limit-var} @tab See @ref{OMP_TEAMS_THREAD_LIMIT}
4843@item @var{wait-policy-var} @tab See @ref{OMP_WAIT_POLICY} and
4844@ref{GOMP_SPINCOUNT}
4845@end multitable
4846
d77de738
ML
4847@node OpenMP Context Selectors
4848@section OpenMP Context Selectors
4849
4850@code{vendor} is always @code{gnu}. References are to the GCC manual.
4851
4852@multitable @columnfractions .60 .10 .25
4853@headitem @code{arch} @tab @code{kind} @tab @code{isa}
4854@item @code{x86}, @code{x86_64}, @code{i386}, @code{i486},
4855 @code{i586}, @code{i686}, @code{ia32}
4856 @tab @code{host}
4857 @tab See @code{-m...} flags in ``x86 Options'' (without @code{-m})
4858@item @code{amdgcn}, @code{gcn}
4859 @tab @code{gpu}
e0b95c2e
TB
4860 @tab See @code{-march=} in ``AMD GCN Options''@footnote{Additionally,
4861 @code{gfx803} is supported as an alias for @code{fiji}.}
d77de738
ML
4862@item @code{nvptx}
4863 @tab @code{gpu}
4864 @tab See @code{-march=} in ``Nvidia PTX Options''
4865@end multitable
4866
450b05ce
TB
4867@node Memory allocation
4868@section Memory allocation
d77de738 4869
a85a106c
TB
4870For the available predefined allocators and, as applicable, their associated
4871predefined memory spaces and for the available traits and their default values,
4872see @ref{OMP_ALLOCATOR}. Predefined allocators without an associated memory
4873space use the @code{omp_default_mem_space} memory space.
4874
8c2fc744
TB
4875For the memory spaces, the following applies:
4876@itemize
4877@item @code{omp_default_mem_space} is supported
4878@item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
4879@item @code{omp_low_lat_mem_space} maps to @code{omp_default_mem_space}
4880@item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space},
4881 unless the memkind library is available
4882@item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
4883 unless the memkind library is available
4884@end itemize
4885
d77de738
ML
4886On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
4887library} (@code{libmemkind.so.0}) is available at runtime, it is used when
4888creating memory allocators requesting
4889
4890@itemize
4891@item the memory space @code{omp_high_bw_mem_space}
4892@item the memory space @code{omp_large_cap_mem_space}
450b05ce 4893@item the @code{partition} trait @code{interleaved}; note that for
8c2fc744 4894 @code{omp_large_cap_mem_space} the allocation will not be interleaved
d77de738
ML
4895@end itemize
4896
450b05ce
TB
4897On Linux systems, where the @uref{https://github.com/numactl/numactl, numa
4898library} (@code{libnuma.so.1}) is available at runtime, it used when creating
4899memory allocators requesting
4900
4901@itemize
4902@item the @code{partition} trait @code{nearest}, except when both the
4903libmemkind library is available and the memory space is either
4904@code{omp_large_cap_mem_space} or @code{omp_high_bw_mem_space}
4905@end itemize
4906
4907Note that the numa library will round up the allocation size to a multiple of
4908the system page size; therefore, consider using it only with large data or
4909by sharing allocations via the @code{pool_size} trait. Furthermore, the Linux
4910kernel does not guarantee that an allocation will always be on the nearest NUMA
4911node nor that after reallocation the same node will be used. Note additionally
4912that, on Linux, the default setting of the memory placement policy is to use the
4913current node; therefore, unless the memory placement policy has been overridden,
4914the @code{partition} trait @code{environment} (the default) will be effectively
4915a @code{nearest} allocation.
4916
a85a106c 4917Additional notes regarding the traits:
8c2fc744
TB
4918@itemize
4919@item The @code{pinned} trait is unsupported.
a85a106c
TB
4920@item The default for the @code{pool_size} trait is no pool and for every
4921 (re)allocation the associated library routine is called, which might
4922 internally use a memory pool.
8c2fc744
TB
4923@item For the @code{partition} trait, the partition part size will be the same
4924 as the requested size (i.e. @code{interleaved} or @code{blocked} has no
4925 effect), except for @code{interleaved} when the memkind library is
450b05ce
TB
4926 available. Furthermore, for @code{nearest} and unless the numa library
4927 is available, the memory might not be on the same NUMA node as thread
4928 that allocated the memory; on Linux, this is in particular the case when
4929 the memory placement policy is set to preferred.
8c2fc744
TB
4930@item The @code{access} trait has no effect such that memory is always
4931 accessible by all threads.
4932@item The @code{sync_hint} trait has no effect.
4933@end itemize
d77de738
ML
4934
4935@c ---------------------------------------------------------------------
4936@c Offload-Target Specifics
4937@c ---------------------------------------------------------------------
4938
4939@node Offload-Target Specifics
4940@chapter Offload-Target Specifics
4941
4942The following sections present notes on the offload-target specifics
4943
4944@menu
4945* AMD Radeon::
4946* nvptx::
4947@end menu
4948
4949@node AMD Radeon
4950@section AMD Radeon (GCN)
4951
4952On the hardware side, there is the hierarchy (fine to coarse):
4953@itemize
4954@item work item (thread)
4955@item wavefront
4956@item work group
81476bc4 4957@item compute unit (CU)
d77de738
ML
4958@end itemize
4959
4960All OpenMP and OpenACC levels are used, i.e.
4961@itemize
4962@item OpenMP's simd and OpenACC's vector map to work items (thread)
4963@item OpenMP's threads (``parallel'') and OpenACC's workers map
4964 to wavefronts
4965@item OpenMP's teams and OpenACC's gang use a threadpool with the
4966 size of the number of teams or gangs, respectively.
4967@end itemize
4968
4969The used sizes are
4970@itemize
4971@item Number of teams is the specified @code{num_teams} (OpenMP) or
81476bc4
MV
4972 @code{num_gangs} (OpenACC) or otherwise the number of CU. It is limited
4973 by two times the number of CU.
d77de738
ML
4974@item Number of wavefronts is 4 for gfx900 and 16 otherwise;
4975 @code{num_threads} (OpenMP) and @code{num_workers} (OpenACC)
4976 overrides this if smaller.
4977@item The wavefront has 102 scalars and 64 vectors
4978@item Number of workitems is always 64
4979@item The hardware permits maximally 40 workgroups/CU and
4980 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU.
4981@item 80 scalars registers and 24 vector registers in non-kernel functions
4982 (the chosen procedure-calling API).
4983@item For the kernel itself: as many as register pressure demands (number of
4984 teams and number of threads, scaled down if registers are exhausted)
4985@end itemize
4986
4987The implementation remark:
4988@itemize
4989@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
4990 using the C library @code{printf} functions and the Fortran
4991 @code{print}/@code{write} statements.
243fa488 4992@item Reverse offload regions (i.e. @code{target} regions with
f84fdb13
TB
4993 @code{device(ancestor:1)}) are processed serially per @code{target} region
4994 such that the next reverse offload region is only executed after the previous
4995 one returned.
f1af7d65 4996@item OpenMP code that has a @code{requires} directive with
f84fdb13
TB
4997 @code{unified_shared_memory} will remove any GCN device from the list of
4998 available devices (``host fallback'').
2e3dd14d
TB
4999@item The available stack size can be changed using the @code{GCN_STACK_SIZE}
5000 environment variable; the default is 32 kiB per thread.
d77de738
ML
5001@end itemize
5002
5003
5004
5005@node nvptx
5006@section nvptx
5007
5008On the hardware side, there is the hierarchy (fine to coarse):
5009@itemize
5010@item thread
5011@item warp
5012@item thread block
5013@item streaming multiprocessor
5014@end itemize
5015
5016All OpenMP and OpenACC levels are used, i.e.
5017@itemize
5018@item OpenMP's simd and OpenACC's vector map to threads
5019@item OpenMP's threads (``parallel'') and OpenACC's workers map to warps
5020@item OpenMP's teams and OpenACC's gang use a threadpool with the
5021 size of the number of teams or gangs, respectively.
5022@end itemize
5023
5024The used sizes are
5025@itemize
5026@item The @code{warp_size} is always 32
5027@item CUDA kernel launched: @code{dim=@{#teams,1,1@}, blocks=@{#threads,warp_size,1@}}.
81476bc4
MV
5028@item The number of teams is limited by the number of blocks the device can
5029 host simultaneously.
d77de738
ML
5030@end itemize
5031
5032Additional information can be obtained by setting the environment variable to
5033@code{GOMP_DEBUG=1} (very verbose; grep for @code{kernel.*launch} for launch
5034parameters).
5035
5036GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA,
5037which caches the JIT in the user's directory (see CUDA documentation; can be
5038tuned by the environment variables @code{CUDA_CACHE_@{DISABLE,MAXSIZE,PATH@}}.
5039
5040Note: While PTX ISA is generic, the @code{-mptx=} and @code{-march=} commandline
eda38850 5041options still affect the used PTX ISA code and, thus, the requirements on
d77de738
ML
5042CUDA version and hardware.
5043
5044The implementation remark:
5045@itemize
5046@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
5047 using the C library @code{printf} functions. Note that the Fortran
5048 @code{print}/@code{write} statements are not supported, yet.
5049@item Compilation OpenMP code that contains @code{requires reverse_offload}
5050 requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
5051 is not supported.
eda38850
TB
5052@item For code containing reverse offload (i.e. @code{target} regions with
5053 @code{device(ancestor:1)}), there is a slight performance penalty
5054 for @emph{all} target regions, consisting mostly of shutdown delay
5055 Per device, reverse offload regions are processed serially such that
5056 the next reverse offload region is only executed after the previous
5057 one returned.
f1af7d65
TB
5058@item OpenMP code that has a @code{requires} directive with
5059 @code{unified_shared_memory} will remove any nvptx device from the
eda38850 5060 list of available devices (``host fallback'').
2cd0689a
TB
5061@item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
5062 in the GCC manual.
25072a47
TB
5063@item The OpenMP routines @code{omp_target_memcpy_rect} and
5064 @code{omp_target_memcpy_rect_async} and the @code{target update}
5065 directive for non-contiguous list items will use the 2D and 3D
5066 memory-copy functions of the CUDA library. Higher dimensions will
5067 call those functions in a loop and are therefore supported.
d77de738
ML
5068@end itemize
5069
5070
5071@c ---------------------------------------------------------------------
5072@c The libgomp ABI
5073@c ---------------------------------------------------------------------
5074
5075@node The libgomp ABI
5076@chapter The libgomp ABI
5077
5078The following sections present notes on the external ABI as
5079presented by libgomp. Only maintainers should need them.
5080
5081@menu
5082* Implementing MASTER construct::
5083* Implementing CRITICAL construct::
5084* Implementing ATOMIC construct::
5085* Implementing FLUSH construct::
5086* Implementing BARRIER construct::
5087* Implementing THREADPRIVATE construct::
5088* Implementing PRIVATE clause::
5089* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
5090* Implementing REDUCTION clause::
5091* Implementing PARALLEL construct::
5092* Implementing FOR construct::
5093* Implementing ORDERED construct::
5094* Implementing SECTIONS construct::
5095* Implementing SINGLE construct::
5096* Implementing OpenACC's PARALLEL construct::
5097@end menu
5098
5099
5100@node Implementing MASTER construct
5101@section Implementing MASTER construct
5102
5103@smallexample
5104if (omp_get_thread_num () == 0)
5105 block
5106@end smallexample
5107
5108Alternately, we generate two copies of the parallel subfunction
5109and only include this in the version run by the primary thread.
5110Surely this is not worthwhile though...
5111
5112
5113
5114@node Implementing CRITICAL construct
5115@section Implementing CRITICAL construct
5116
5117Without a specified name,
5118
5119@smallexample
5120 void GOMP_critical_start (void);
5121 void GOMP_critical_end (void);
5122@end smallexample
5123
5124so that we don't get COPY relocations from libgomp to the main
5125application.
5126
5127With a specified name, use omp_set_lock and omp_unset_lock with
5128name being transformed into a variable declared like
5129
5130@smallexample
5131 omp_lock_t gomp_critical_user_<name> __attribute__((common))
5132@end smallexample
5133
5134Ideally the ABI would specify that all zero is a valid unlocked
5135state, and so we wouldn't need to initialize this at
5136startup.
5137
5138
5139
5140@node Implementing ATOMIC construct
5141@section Implementing ATOMIC construct
5142
5143The target should implement the @code{__sync} builtins.
5144
5145Failing that we could add
5146
5147@smallexample
5148 void GOMP_atomic_enter (void)
5149 void GOMP_atomic_exit (void)
5150@end smallexample
5151
5152which reuses the regular lock code, but with yet another lock
5153object private to the library.
5154
5155
5156
5157@node Implementing FLUSH construct
5158@section Implementing FLUSH construct
5159
5160Expands to the @code{__sync_synchronize} builtin.
5161
5162
5163
5164@node Implementing BARRIER construct
5165@section Implementing BARRIER construct
5166
5167@smallexample
5168 void GOMP_barrier (void)
5169@end smallexample
5170
5171
5172@node Implementing THREADPRIVATE construct
5173@section Implementing THREADPRIVATE construct
5174
5175In _most_ cases we can map this directly to @code{__thread}. Except
5176that OMP allows constructors for C++ objects. We can either
5177refuse to support this (how often is it used?) or we can
5178implement something akin to .ctors.
5179
5180Even more ideally, this ctor feature is handled by extensions
5181to the main pthreads library. Failing that, we can have a set
5182of entry points to register ctor functions to be called.
5183
5184
5185
5186@node Implementing PRIVATE clause
5187@section Implementing PRIVATE clause
5188
5189In association with a PARALLEL, or within the lexical extent
5190of a PARALLEL block, the variable becomes a local variable in
5191the parallel subfunction.
5192
5193In association with FOR or SECTIONS blocks, create a new
5194automatic variable within the current function. This preserves
5195the semantic of new variable creation.
5196
5197
5198
5199@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
5200@section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
5201
5202This seems simple enough for PARALLEL blocks. Create a private
5203struct for communicating between the parent and subfunction.
5204In the parent, copy in values for scalar and "small" structs;
5205copy in addresses for others TREE_ADDRESSABLE types. In the
5206subfunction, copy the value into the local variable.
5207
5208It is not clear what to do with bare FOR or SECTION blocks.
5209The only thing I can figure is that we do something like:
5210
5211@smallexample
5212#pragma omp for firstprivate(x) lastprivate(y)
5213for (int i = 0; i < n; ++i)
5214 body;
5215@end smallexample
5216
5217which becomes
5218
5219@smallexample
5220@{
5221 int x = x, y;
5222
5223 // for stuff
5224
5225 if (i == n)
5226 y = y;
5227@}
5228@end smallexample
5229
5230where the "x=x" and "y=y" assignments actually have different
5231uids for the two variables, i.e. not something you could write
5232directly in C. Presumably this only makes sense if the "outer"
5233x and y are global variables.
5234
5235COPYPRIVATE would work the same way, except the structure
5236broadcast would have to happen via SINGLE machinery instead.
5237
5238
5239
5240@node Implementing REDUCTION clause
5241@section Implementing REDUCTION clause
5242
5243The private struct mentioned in the previous section should have
5244a pointer to an array of the type of the variable, indexed by the
5245thread's @var{team_id}. The thread stores its final value into the
5246array, and after the barrier, the primary thread iterates over the
5247array to collect the values.
5248
5249
5250@node Implementing PARALLEL construct
5251@section Implementing PARALLEL construct
5252
5253@smallexample
5254 #pragma omp parallel
5255 @{
5256 body;
5257 @}
5258@end smallexample
5259
5260becomes
5261
5262@smallexample
5263 void subfunction (void *data)
5264 @{
5265 use data;
5266 body;
5267 @}
5268
5269 setup data;
5270 GOMP_parallel_start (subfunction, &data, num_threads);
5271 subfunction (&data);
5272 GOMP_parallel_end ();
5273@end smallexample
5274
5275@smallexample
5276 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
5277@end smallexample
5278
5279The @var{FN} argument is the subfunction to be run in parallel.
5280
5281The @var{DATA} argument is a pointer to a structure used to
5282communicate data in and out of the subfunction, as discussed
5283above with respect to FIRSTPRIVATE et al.
5284
5285The @var{NUM_THREADS} argument is 1 if an IF clause is present
5286and false, or the value of the NUM_THREADS clause, if
5287present, or 0.
5288
5289The function needs to create the appropriate number of
5290threads and/or launch them from the dock. It needs to
5291create the team structure and assign team ids.
5292
5293@smallexample
5294 void GOMP_parallel_end (void)
5295@end smallexample
5296
5297Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
5298
5299
5300
5301@node Implementing FOR construct
5302@section Implementing FOR construct
5303
5304@smallexample
5305 #pragma omp parallel for
5306 for (i = lb; i <= ub; i++)
5307 body;
5308@end smallexample
5309
5310becomes
5311
5312@smallexample
5313 void subfunction (void *data)
5314 @{
5315 long _s0, _e0;
5316 while (GOMP_loop_static_next (&_s0, &_e0))
5317 @{
5318 long _e1 = _e0, i;
5319 for (i = _s0; i < _e1; i++)
5320 body;
5321 @}
5322 GOMP_loop_end_nowait ();
5323 @}
5324
5325 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
5326 subfunction (NULL);
5327 GOMP_parallel_end ();
5328@end smallexample
5329
5330@smallexample
5331 #pragma omp for schedule(runtime)
5332 for (i = 0; i < n; i++)
5333 body;
5334@end smallexample
5335
5336becomes
5337
5338@smallexample
5339 @{
5340 long i, _s0, _e0;
5341 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
5342 do @{
5343 long _e1 = _e0;
5344 for (i = _s0, i < _e0; i++)
5345 body;
5346 @} while (GOMP_loop_runtime_next (&_s0, _&e0));
5347 GOMP_loop_end ();
5348 @}
5349@end smallexample
5350
5351Note that while it looks like there is trickiness to propagating
5352a non-constant STEP, there isn't really. We're explicitly allowed
5353to evaluate it as many times as we want, and any variables involved
5354should automatically be handled as PRIVATE or SHARED like any other
5355variables. So the expression should remain evaluable in the
5356subfunction. We can also pull it into a local variable if we like,
5357but since its supposed to remain unchanged, we can also not if we like.
5358
5359If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
5360able to get away with no work-sharing context at all, since we can
5361simply perform the arithmetic directly in each thread to divide up
5362the iterations. Which would mean that we wouldn't need to call any
5363of these routines.
5364
5365There are separate routines for handling loops with an ORDERED
5366clause. Bookkeeping for that is non-trivial...
5367
5368
5369
5370@node Implementing ORDERED construct
5371@section Implementing ORDERED construct
5372
5373@smallexample
5374 void GOMP_ordered_start (void)
5375 void GOMP_ordered_end (void)
5376@end smallexample
5377
5378
5379
5380@node Implementing SECTIONS construct
5381@section Implementing SECTIONS construct
5382
5383A block as
5384
5385@smallexample
5386 #pragma omp sections
5387 @{
5388 #pragma omp section
5389 stmt1;
5390 #pragma omp section
5391 stmt2;
5392 #pragma omp section
5393 stmt3;
5394 @}
5395@end smallexample
5396
5397becomes
5398
5399@smallexample
5400 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
5401 switch (i)
5402 @{
5403 case 1:
5404 stmt1;
5405 break;
5406 case 2:
5407 stmt2;
5408 break;
5409 case 3:
5410 stmt3;
5411 break;
5412 @}
5413 GOMP_barrier ();
5414@end smallexample
5415
5416
5417@node Implementing SINGLE construct
5418@section Implementing SINGLE construct
5419
5420A block like
5421
5422@smallexample
5423 #pragma omp single
5424 @{
5425 body;
5426 @}
5427@end smallexample
5428
5429becomes
5430
5431@smallexample
5432 if (GOMP_single_start ())
5433 body;
5434 GOMP_barrier ();
5435@end smallexample
5436
5437while
5438
5439@smallexample
5440 #pragma omp single copyprivate(x)
5441 body;
5442@end smallexample
5443
5444becomes
5445
5446@smallexample
5447 datap = GOMP_single_copy_start ();
5448 if (datap == NULL)
5449 @{
5450 body;
5451 data.x = x;
5452 GOMP_single_copy_end (&data);
5453 @}
5454 else
5455 x = datap->x;
5456 GOMP_barrier ();
5457@end smallexample
5458
5459
5460
5461@node Implementing OpenACC's PARALLEL construct
5462@section Implementing OpenACC's PARALLEL construct
5463
5464@smallexample
5465 void GOACC_parallel ()
5466@end smallexample
5467
5468
5469
5470@c ---------------------------------------------------------------------
5471@c Reporting Bugs
5472@c ---------------------------------------------------------------------
5473
5474@node Reporting Bugs
5475@chapter Reporting Bugs
5476
5477Bugs in the GNU Offloading and Multi Processing Runtime Library should
5478be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
5479"openacc", or "openmp", or both to the keywords field in the bug
5480report, as appropriate.
5481
5482
5483
5484@c ---------------------------------------------------------------------
5485@c GNU General Public License
5486@c ---------------------------------------------------------------------
5487
5488@include gpl_v3.texi
5489
5490
5491
5492@c ---------------------------------------------------------------------
5493@c GNU Free Documentation License
5494@c ---------------------------------------------------------------------
5495
5496@include fdl.texi
5497
5498
5499
5500@c ---------------------------------------------------------------------
5501@c Funding Free Software
5502@c ---------------------------------------------------------------------
5503
5504@include funding.texi
5505
5506@c ---------------------------------------------------------------------
5507@c Index
5508@c ---------------------------------------------------------------------
5509
5510@node Library Index
5511@unnumbered Library Index
5512
5513@printindex cp
5514
5515@bye