]> git.ipfire.org Git - thirdparty/gcc.git/blame - libgomp/libgomp.texi
Revert "gcc-changelog: temporarily disable check_line_start"
[thirdparty/gcc.git] / libgomp / libgomp.texi
CommitLineData
d77de738
ML
1\input texinfo @c -*-texinfo-*-
2
3@c %**start of header
4@setfilename libgomp.info
5@settitle GNU libgomp
6@c %**end of header
7
8
9@copying
10Copyright @copyright{} 2006-2022 Free Software Foundation, Inc.
11
12Permission is granted to copy, distribute and/or modify this document
13under the terms of the GNU Free Documentation License, Version 1.3 or
14any later version published by the Free Software Foundation; with the
15Invariant Sections being ``Funding Free Software'', the Front-Cover
16texts being (a) (see below), and with the Back-Cover Texts being (b)
17(see below). A copy of the license is included in the section entitled
18``GNU Free Documentation License''.
19
20(a) The FSF's Front-Cover Text is:
21
22 A GNU Manual
23
24(b) The FSF's Back-Cover Text is:
25
26 You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development.
29@end copying
30
31@ifinfo
32@dircategory GNU Libraries
33@direntry
34* libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
35@end direntry
36
37This manual documents libgomp, the GNU Offloading and Multi Processing
38Runtime library. This is the GNU implementation of the OpenMP and
39OpenACC APIs for parallel and accelerator programming in C/C++ and
40Fortran.
41
42Published by the Free Software Foundation
4351 Franklin Street, Fifth Floor
44Boston, MA 02110-1301 USA
45
46@insertcopying
47@end ifinfo
48
49
50@setchapternewpage odd
51
52@titlepage
53@title GNU Offloading and Multi Processing Runtime Library
54@subtitle The GNU OpenMP and OpenACC Implementation
55@page
56@vskip 0pt plus 1filll
57@comment For the @value{version-GCC} Version*
58@sp 1
59Published by the Free Software Foundation @*
6051 Franklin Street, Fifth Floor@*
61Boston, MA 02110-1301, USA@*
62@sp 1
63@insertcopying
64@end titlepage
65
66@summarycontents
67@contents
68@page
69
70
71@node Top, Enabling OpenMP
72@top Introduction
73@cindex Introduction
74
75This manual documents the usage of libgomp, the GNU Offloading and
76Multi Processing Runtime Library. This includes the GNU
77implementation of the @uref{https://www.openmp.org, OpenMP} Application
78Programming Interface (API) for multi-platform shared-memory parallel
79programming in C/C++ and Fortran, and the GNU implementation of the
80@uref{https://www.openacc.org, OpenACC} Application Programming
81Interface (API) for offloading of code to accelerator devices in C/C++
82and Fortran.
83
84Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
85on this, support for OpenACC and offloading (both OpenACC and OpenMP
864's target construct) has been added later on, and the library's name
87changed to GNU Offloading and Multi Processing Runtime Library.
88
89
90
91@comment
92@comment When you add a new menu item, please keep the right hand
93@comment aligned to the same column. Do not use tabs. This provides
94@comment better formatting.
95@comment
96@menu
97* Enabling OpenMP:: How to enable OpenMP for your applications.
98* OpenMP Implementation Status:: List of implemented features by OpenMP version
99* OpenMP Runtime Library Routines: Runtime Library Routines.
100 The OpenMP runtime application programming
101 interface.
102* OpenMP Environment Variables: Environment Variables.
103 Influencing OpenMP runtime behavior with
104 environment variables.
105* Enabling OpenACC:: How to enable OpenACC for your
106 applications.
107* OpenACC Runtime Library Routines:: The OpenACC runtime application
108 programming interface.
109* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
110 environment variables.
111* CUDA Streams Usage:: Notes on the implementation of
112 asynchronous operations.
113* OpenACC Library Interoperability:: OpenACC library interoperability with the
114 NVIDIA CUBLAS library.
115* OpenACC Profiling Interface::
116* OpenMP-Implementation Specifics:: Notes specifics of this OpenMP
117 implementation
118* Offload-Target Specifics:: Notes on offload-target specific internals
119* The libgomp ABI:: Notes on the external ABI presented by libgomp.
120* Reporting Bugs:: How to report bugs in the GNU Offloading and
121 Multi Processing Runtime Library.
122* Copying:: GNU general public license says
123 how you can copy and share libgomp.
124* GNU Free Documentation License::
125 How you can copy and share this manual.
126* Funding:: How to help assure continued work for free
127 software.
128* Library Index:: Index of this documentation.
129@end menu
130
131
132@c ---------------------------------------------------------------------
133@c Enabling OpenMP
134@c ---------------------------------------------------------------------
135
136@node Enabling OpenMP
137@chapter Enabling OpenMP
138
139To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
140flag @command{-fopenmp} must be specified. This enables the OpenMP directive
141@code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
142@code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
143@code{!$} conditional compilation sentinels in free form and @code{c$},
144@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
145arranges for automatic linking of the OpenMP runtime library
146(@ref{Runtime Library Routines}).
147
148A complete description of all OpenMP directives may be found in the
149@uref{https://www.openmp.org, OpenMP Application Program Interface} manuals.
150See also @ref{OpenMP Implementation Status}.
151
152
153@c ---------------------------------------------------------------------
154@c OpenMP Implementation Status
155@c ---------------------------------------------------------------------
156
157@node OpenMP Implementation Status
158@chapter OpenMP Implementation Status
159
160@menu
161* OpenMP 4.5:: Feature completion status to 4.5 specification
162* OpenMP 5.0:: Feature completion status to 5.0 specification
163* OpenMP 5.1:: Feature completion status to 5.1 specification
164* OpenMP 5.2:: Feature completion status to 5.2 specification
165@end menu
166
167The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version}
168parameter, provided by @code{omp_lib.h} and the @code{omp_lib} module, have
169the value @code{201511} (i.e. OpenMP 4.5).
170
171@node OpenMP 4.5
172@section OpenMP 4.5
173
174The OpenMP 4.5 specification is fully supported.
175
176@node OpenMP 5.0
177@section OpenMP 5.0
178
179@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
180@c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2
181
182@multitable @columnfractions .60 .10 .25
183@headitem Description @tab Status @tab Comments
184@item Array shaping @tab N @tab
185@item Array sections with non-unit strides in C and C++ @tab N @tab
186@item Iterators @tab Y @tab
187@item @code{metadirective} directive @tab N @tab
188@item @code{declare variant} directive
189 @tab P @tab @emph{simd} traits not handled correctly
190@item @emph{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
191 env variable @tab Y @tab
192@item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
193@item @code{requires} directive @tab P
194 @tab complete but no non-host devices provides @code{unified_address},
195 @code{unified_shared_memory} or @code{reverse_offload}
196@item @code{teams} construct outside an enclosing target region @tab Y @tab
197@item Non-rectangular loop nests @tab Y @tab
198@item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
199@item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
200 constructs @tab Y @tab
201@item Collapse of associated loops that are imperfectly nested loops @tab N @tab
202@item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in
203 @code{simd} construct @tab Y @tab
204@item @code{atomic} constructs in @code{simd} @tab Y @tab
205@item @code{loop} construct @tab Y @tab
206@item @code{order(concurrent)} clause @tab Y @tab
207@item @code{scan} directive and @code{in_scan} modifier for the
208 @code{reduction} clause @tab Y @tab
209@item @code{in_reduction} clause on @code{task} constructs @tab Y @tab
210@item @code{in_reduction} clause on @code{target} constructs @tab P
211 @tab @code{nowait} only stub
212@item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab
213@item @code{task} modifier to @code{reduction} clause @tab Y @tab
214@item @code{affinity} clause to @code{task} construct @tab Y @tab Stub only
215@item @code{detach} clause to @code{task} construct @tab Y @tab
216@item @code{omp_fulfill_event} runtime routine @tab Y @tab
217@item @code{reduction} and @code{in_reduction} clauses on @code{taskloop}
218 and @code{taskloop simd} constructs @tab Y @tab
219@item @code{taskloop} construct cancelable by @code{cancel} construct
220 @tab Y @tab
221@item @code{mutexinoutset} @emph{dependence-type} for @code{depend} clause
222 @tab Y @tab
223@item Predefined memory spaces, memory allocators, allocator traits
224 @tab Y @tab Some are only stubs
225@item Memory management routines @tab Y @tab
226@item @code{allocate} directive @tab N @tab
227@item @code{allocate} clause @tab P @tab Initial support
228@item @code{use_device_addr} clause on @code{target data} @tab Y @tab
229@item @code{ancestor} modifier on @code{device} clause
230 @tab Y @tab See comment for @code{requires}
231@item Implicit declare target directive @tab Y @tab
232@item Discontiguous array section with @code{target update} construct
233 @tab N @tab
234@item C/C++'s lvalue expressions in @code{to}, @code{from}
235 and @code{map} clauses @tab N @tab
236@item C/C++'s lvalue expressions in @code{depend} clauses @tab Y @tab
237@item Nested @code{declare target} directive @tab Y @tab
238@item Combined @code{master} constructs @tab Y @tab
239@item @code{depend} clause on @code{taskwait} @tab Y @tab
240@item Weak memory ordering clauses on @code{atomic} and @code{flush} construct
241 @tab Y @tab
242@item @code{hint} clause on the @code{atomic} construct @tab Y @tab Stub only
243@item @code{depobj} construct and depend objects @tab Y @tab
244@item Lock hints were renamed to synchronization hints @tab Y @tab
245@item @code{conditional} modifier to @code{lastprivate} clause @tab Y @tab
246@item Map-order clarifications @tab P @tab
247@item @code{close} @emph{map-type-modifier} @tab Y @tab
248@item Mapping C/C++ pointer variables and to assign the address of
249 device memory mapped by an array section @tab P @tab
250@item Mapping of Fortran pointer and allocatable variables, including pointer
251 and allocatable components of variables
252 @tab P @tab Mapping of vars with allocatable components unsupported
253@item @code{defaultmap} extensions @tab Y @tab
254@item @code{declare mapper} directive @tab N @tab
255@item @code{omp_get_supported_active_levels} routine @tab Y @tab
256@item Runtime routines and environment variables to display runtime thread
257 affinity information @tab Y @tab
258@item @code{omp_pause_resource} and @code{omp_pause_resource_all} runtime
259 routines @tab Y @tab
260@item @code{omp_get_device_num} runtime routine @tab Y @tab
261@item OMPT interface @tab N @tab
262@item OMPD interface @tab N @tab
263@end multitable
264
265@unnumberedsubsec Other new OpenMP 5.0 features
266
267@multitable @columnfractions .60 .10 .25
268@headitem Description @tab Status @tab Comments
269@item Supporting C++'s range-based for loop @tab Y @tab
270@end multitable
271
272
273@node OpenMP 5.1
274@section OpenMP 5.1
275
276@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
277
278@multitable @columnfractions .60 .10 .25
279@headitem Description @tab Status @tab Comments
280@item OpenMP directive as C++ attribute specifiers @tab Y @tab
281@item @code{omp_all_memory} reserved locator @tab Y @tab
282@item @emph{target_device trait} in OpenMP Context @tab N @tab
283@item @code{target_device} selector set in context selectors @tab N @tab
284@item C/C++'s @code{declare variant} directive: elision support of
285 preprocessed code @tab N @tab
286@item @code{declare variant}: new clauses @code{adjust_args} and
287 @code{append_args} @tab N @tab
288@item @code{dispatch} construct @tab N @tab
289@item device-specific ICV settings with environment variables @tab Y @tab
290@item @code{assume} directive @tab Y @tab
291@item @code{nothing} directive @tab Y @tab
292@item @code{error} directive @tab Y @tab
293@item @code{masked} construct @tab Y @tab
294@item @code{scope} directive @tab Y @tab
295@item Loop transformation constructs @tab N @tab
296@item @code{strict} modifier in the @code{grainsize} and @code{num_tasks}
297 clauses of the @code{taskloop} construct @tab Y @tab
298@item @code{align} clause/modifier in @code{allocate} directive/clause
299 and @code{allocator} directive @tab P @tab C/C++ on clause only
300@item @code{thread_limit} clause to @code{target} construct @tab Y @tab
301@item @code{has_device_addr} clause to @code{target} construct @tab Y @tab
302@item Iterators in @code{target update} motion clauses and @code{map}
303 clauses @tab N @tab
304@item Indirect calls to the device version of a procedure or function in
305 @code{target} regions @tab N @tab
306@item @code{interop} directive @tab N @tab
307@item @code{omp_interop_t} object support in runtime routines @tab N @tab
308@item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
309@item Extensions to the @code{atomic} directive @tab Y @tab
310@item @code{seq_cst} clause on a @code{flush} construct @tab Y @tab
311@item @code{inoutset} argument to the @code{depend} clause @tab Y @tab
312@item @code{private} and @code{firstprivate} argument to @code{default}
313 clause in C and C++ @tab Y @tab
314@item @code{present} argument to @code{defaultmap} clause @tab N @tab
315@item @code{omp_set_num_teams}, @code{omp_set_teams_thread_limit},
316 @code{omp_get_max_teams}, @code{omp_get_teams_thread_limit} runtime
317 routines @tab Y @tab
318@item @code{omp_target_is_accessible} runtime routine @tab Y @tab
319@item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
320 runtime routines @tab Y @tab
321@item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
322@item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
323 @code{omp_aligned_calloc} runtime routines @tab Y @tab
324@item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
325 @code{omp_atv_default} changed @tab Y @tab
326@item @code{omp_display_env} runtime routine @tab Y @tab
327@item @code{ompt_scope_endpoint_t} enum: @code{ompt_scope_beginend} @tab N @tab
328@item @code{ompt_sync_region_t} enum additions @tab N @tab
329@item @code{ompt_state_t} enum: @code{ompt_state_wait_barrier_implementation}
330 and @code{ompt_state_wait_barrier_teams} @tab N @tab
331@item @code{ompt_callback_target_data_op_emi_t},
332 @code{ompt_callback_target_emi_t}, @code{ompt_callback_target_map_emi_t}
333 and @code{ompt_callback_target_submit_emi_t} @tab N @tab
334@item @code{ompt_callback_error_t} type @tab N @tab
335@item @code{OMP_PLACES} syntax extensions @tab Y @tab
336@item @code{OMP_NUM_TEAMS} and @code{OMP_TEAMS_THREAD_LIMIT} environment
337 variables @tab Y @tab
338@end multitable
339
340@unnumberedsubsec Other new OpenMP 5.1 features
341
342@multitable @columnfractions .60 .10 .25
343@headitem Description @tab Status @tab Comments
344@item Support of strictly structured blocks in Fortran @tab Y @tab
345@item Support of structured block sequences in C/C++ @tab Y @tab
346@item @code{unconstrained} and @code{reproducible} modifiers on @code{order}
347 clause @tab Y @tab
348@item Support @code{begin/end declare target} syntax in C/C++ @tab Y @tab
349@item Pointer predetermined firstprivate getting initialized
350to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
351@item For Fortran, diagnose placing declarative before/between @code{USE},
352 @code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
353@end multitable
354
355
356@node OpenMP 5.2
357@section OpenMP 5.2
358
359@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
360
361@multitable @columnfractions .60 .10 .25
362@headitem Description @tab Status @tab Comments
363@item @code{omp_in_explicit_task} routine and @emph{explicit-task-var} ICV
364 @tab Y @tab
365@item @code{omp}/@code{ompx}/@code{omx} sentinels and @code{omp_}/@code{ompx_}
366 namespaces @tab N/A
367 @tab warning for @code{ompx/omx} sentinels@footnote{The @code{ompx}
368 sentinel as C/C++ pragma and C++ attributes are warned for with
369 @code{-Wunknown-pragmas} (implied by @code{-Wall}) and @code{-Wattributes}
370 (enabled by default), respectively; for Fortran free-source code, there is
371 a warning enabled by default and, for fixed-source code, the @code{omx}
372 sentinel is warned for with with @code{-Wsurprising} (enabled by
373 @code{-Wall}). Unknown clauses are always rejected with an error.}
374@item Clauses on @code{end} directive can be on directive @tab N @tab
375@item Deprecation of no-argument @code{destroy} clause on @code{depobj}
376 @tab N @tab
377@item @code{linear} clause syntax changes and @code{step} modifier @tab Y @tab
378@item Deprecation of minus operator for reductions @tab N @tab
379@item Deprecation of separating @code{map} modifiers without comma @tab N @tab
380@item @code{declare mapper} with iterator and @code{present} modifiers
381 @tab N @tab
382@item If a matching mapped list item is not found in the data environment, the
383 pointer retains its original value @tab N @tab
384@item New @code{enter} clause as alias for @code{to} on declare target directive
385 @tab Y @tab
386@item Deprecation of @code{to} clause on declare target directive @tab N @tab
387@item Extended list of directives permitted in Fortran pure procedures
388 @tab N @tab
389@item New @code{allocators} directive for Fortran @tab N @tab
390@item Deprecation of @code{allocate} directive for Fortran
391 allocatables/pointers @tab N @tab
392@item Optional paired @code{end} directive with @code{dispatch} @tab N @tab
393@item New @code{memspace} and @code{traits} modifiers for @code{uses_allocators}
394 @tab N @tab
395@item Deprecation of traits array following the allocator_handle expression in
396 @code{uses_allocators} @tab N @tab
397@item New @code{otherwise} clause as alias for @code{default} on metadirectives
398 @tab N @tab
399@item Deprecation of @code{default} clause on metadirectives @tab N @tab
400@item Deprecation of delimited form of @code{declare target} @tab N @tab
401@item Reproducible semantics changed for @code{order(concurrent)} @tab N @tab
402@item @code{allocate} and @code{firstprivate} clauses on @code{scope}
403 @tab Y @tab
404@item @code{ompt_callback_work} @tab N @tab
405@item Default map-type for @code{map} clause in @code{target enter/exit data}
406 @tab Y @tab
407@item New @code{doacross} clause as alias for @code{depend} with
408 @code{source}/@code{sink} modifier @tab Y @tab
409@item Deprecation of @code{depend} with @code{source}/@code{sink} modifier
410 @tab N @tab
411@item @code{omp_cur_iteration} keyword @tab Y @tab
412@end multitable
413
414@unnumberedsubsec Other new OpenMP 5.2 features
415
416@multitable @columnfractions .60 .10 .25
417@headitem Description @tab Status @tab Comments
418@item For Fortran, optional comma between directive and clause @tab N @tab
419@item Conforming device numbers and @code{omp_initial_device} and
420 @code{omp_invalid_device} enum/PARAMETER @tab Y @tab
421@item Initial value of @emph{default-device-var} ICV with
422 @code{OMP_TARGET_OFFLOAD=mandatory} @tab N @tab
423@item @emph{interop_types} in any position of the modifier list for the @code{init} clause
424 of the @code{interop} construct @tab N @tab
425@end multitable
426
427
428@c ---------------------------------------------------------------------
429@c OpenMP Runtime Library Routines
430@c ---------------------------------------------------------------------
431
432@node Runtime Library Routines
433@chapter OpenMP Runtime Library Routines
434
435The runtime routines described here are defined by Section 3 of the OpenMP
436specification in version 4.5. The routines are structured in following
437three parts:
438
439@menu
440Control threads, processors and the parallel environment. They have C
441linkage, and do not throw exceptions.
442
443* omp_get_active_level:: Number of active parallel regions
444* omp_get_ancestor_thread_num:: Ancestor thread ID
445* omp_get_cancellation:: Whether cancellation support is enabled
446* omp_get_default_device:: Get the default device for target regions
447* omp_get_device_num:: Get device that current thread is running on
448* omp_get_dynamic:: Dynamic teams setting
449* omp_get_initial_device:: Device number of host device
450* omp_get_level:: Number of parallel regions
451* omp_get_max_active_levels:: Current maximum number of active regions
452* omp_get_max_task_priority:: Maximum task priority value that can be set
453* omp_get_max_teams:: Maximum number of teams for teams region
454* omp_get_max_threads:: Maximum number of threads of parallel region
455* omp_get_nested:: Nested parallel regions
456* omp_get_num_devices:: Number of target devices
457* omp_get_num_procs:: Number of processors online
458* omp_get_num_teams:: Number of teams
459* omp_get_num_threads:: Size of the active team
460* omp_get_proc_bind:: Whether theads may be moved between CPUs
461* omp_get_schedule:: Obtain the runtime scheduling method
462* omp_get_supported_active_levels:: Maximum number of active regions supported
463* omp_get_team_num:: Get team number
464* omp_get_team_size:: Number of threads in a team
465* omp_get_teams_thread_limit:: Maximum number of threads imposed by teams
466* omp_get_thread_limit:: Maximum number of threads
467* omp_get_thread_num:: Current thread ID
468* omp_in_parallel:: Whether a parallel region is active
469* omp_in_final:: Whether in final or included task region
470* omp_is_initial_device:: Whether executing on the host device
471* omp_set_default_device:: Set the default device for target regions
472* omp_set_dynamic:: Enable/disable dynamic teams
473* omp_set_max_active_levels:: Limits the number of active parallel regions
474* omp_set_nested:: Enable/disable nested parallel regions
475* omp_set_num_teams:: Set upper teams limit for teams region
476* omp_set_num_threads:: Set upper team size limit
477* omp_set_schedule:: Set the runtime scheduling method
478* omp_set_teams_thread_limit:: Set upper thread limit for teams construct
479
480Initialize, set, test, unset and destroy simple and nested locks.
481
482* omp_init_lock:: Initialize simple lock
483* omp_set_lock:: Wait for and set simple lock
484* omp_test_lock:: Test and set simple lock if available
485* omp_unset_lock:: Unset simple lock
486* omp_destroy_lock:: Destroy simple lock
487* omp_init_nest_lock:: Initialize nested lock
488* omp_set_nest_lock:: Wait for and set simple lock
489* omp_test_nest_lock:: Test and set nested lock if available
490* omp_unset_nest_lock:: Unset nested lock
491* omp_destroy_nest_lock:: Destroy nested lock
492
493Portable, thread-based, wall clock timer.
494
495* omp_get_wtick:: Get timer precision.
496* omp_get_wtime:: Elapsed wall clock time.
497
498Support for event objects.
499
500* omp_fulfill_event:: Fulfill and destroy an OpenMP event.
501@end menu
502
503
504
505@node omp_get_active_level
506@section @code{omp_get_active_level} -- Number of parallel regions
507@table @asis
508@item @emph{Description}:
509This function returns the nesting level for the active parallel blocks,
510which enclose the calling call.
511
512@item @emph{C/C++}
513@multitable @columnfractions .20 .80
514@item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
515@end multitable
516
517@item @emph{Fortran}:
518@multitable @columnfractions .20 .80
519@item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
520@end multitable
521
522@item @emph{See also}:
523@ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
524
525@item @emph{Reference}:
526@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
527@end table
528
529
530
531@node omp_get_ancestor_thread_num
532@section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
533@table @asis
534@item @emph{Description}:
535This function returns the thread identification number for the given
536nesting level of the current thread. For values of @var{level} outside
537zero to @code{omp_get_level} -1 is returned; if @var{level} is
538@code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
539
540@item @emph{C/C++}
541@multitable @columnfractions .20 .80
542@item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
543@end multitable
544
545@item @emph{Fortran}:
546@multitable @columnfractions .20 .80
547@item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
548@item @tab @code{integer level}
549@end multitable
550
551@item @emph{See also}:
552@ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
553
554@item @emph{Reference}:
555@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
556@end table
557
558
559
560@node omp_get_cancellation
561@section @code{omp_get_cancellation} -- Whether cancellation support is enabled
562@table @asis
563@item @emph{Description}:
564This function returns @code{true} if cancellation is activated, @code{false}
565otherwise. Here, @code{true} and @code{false} represent their language-specific
566counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are
567deactivated.
568
569@item @emph{C/C++}:
570@multitable @columnfractions .20 .80
571@item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
572@end multitable
573
574@item @emph{Fortran}:
575@multitable @columnfractions .20 .80
576@item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
577@end multitable
578
579@item @emph{See also}:
580@ref{OMP_CANCELLATION}
581
582@item @emph{Reference}:
583@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
584@end table
585
586
587
588@node omp_get_default_device
589@section @code{omp_get_default_device} -- Get the default device for target regions
590@table @asis
591@item @emph{Description}:
592Get the default device for target regions without device clause.
593
594@item @emph{C/C++}:
595@multitable @columnfractions .20 .80
596@item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
597@end multitable
598
599@item @emph{Fortran}:
600@multitable @columnfractions .20 .80
601@item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
602@end multitable
603
604@item @emph{See also}:
605@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
606
607@item @emph{Reference}:
608@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
609@end table
610
611
612
613@node omp_get_device_num
614@section @code{omp_get_device_num} -- Return device number of current device
615@table @asis
616@item @emph{Description}:
617This function returns a device number that represents the device that the
618current thread is executing on. For OpenMP 5.0, this must be equal to the
619value returned by the @code{omp_get_initial_device} function when called
620from the host.
621
622@item @emph{C/C++}
623@multitable @columnfractions .20 .80
624@item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
625@end multitable
626
627@item @emph{Fortran}:
628@multitable @columnfractions .20 .80
629@item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
630@end multitable
631
632@item @emph{See also}:
633@ref{omp_get_initial_device}
634
635@item @emph{Reference}:
636@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
637@end table
638
639
640
641@node omp_get_dynamic
642@section @code{omp_get_dynamic} -- Dynamic teams setting
643@table @asis
644@item @emph{Description}:
645This function returns @code{true} if enabled, @code{false} otherwise.
646Here, @code{true} and @code{false} represent their language-specific
647counterparts.
648
649The dynamic team setting may be initialized at startup by the
650@env{OMP_DYNAMIC} environment variable or at runtime using
651@code{omp_set_dynamic}. If undefined, dynamic adjustment is
652disabled by default.
653
654@item @emph{C/C++}:
655@multitable @columnfractions .20 .80
656@item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
657@end multitable
658
659@item @emph{Fortran}:
660@multitable @columnfractions .20 .80
661@item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
662@end multitable
663
664@item @emph{See also}:
665@ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
666
667@item @emph{Reference}:
668@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
669@end table
670
671
672
673@node omp_get_initial_device
674@section @code{omp_get_initial_device} -- Return device number of initial device
675@table @asis
676@item @emph{Description}:
677This function returns a device number that represents the host device.
678For OpenMP 5.1, this must be equal to the value returned by the
679@code{omp_get_num_devices} function.
680
681@item @emph{C/C++}
682@multitable @columnfractions .20 .80
683@item @emph{Prototype}: @tab @code{int omp_get_initial_device(void);}
684@end multitable
685
686@item @emph{Fortran}:
687@multitable @columnfractions .20 .80
688@item @emph{Interface}: @tab @code{integer function omp_get_initial_device()}
689@end multitable
690
691@item @emph{See also}:
692@ref{omp_get_num_devices}
693
694@item @emph{Reference}:
695@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.35.
696@end table
697
698
699
700@node omp_get_level
701@section @code{omp_get_level} -- Obtain the current nesting level
702@table @asis
703@item @emph{Description}:
704This function returns the nesting level for the parallel blocks,
705which enclose the calling call.
706
707@item @emph{C/C++}
708@multitable @columnfractions .20 .80
709@item @emph{Prototype}: @tab @code{int omp_get_level(void);}
710@end multitable
711
712@item @emph{Fortran}:
713@multitable @columnfractions .20 .80
714@item @emph{Interface}: @tab @code{integer function omp_level()}
715@end multitable
716
717@item @emph{See also}:
718@ref{omp_get_active_level}
719
720@item @emph{Reference}:
721@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
722@end table
723
724
725
726@node omp_get_max_active_levels
727@section @code{omp_get_max_active_levels} -- Current maximum number of active regions
728@table @asis
729@item @emph{Description}:
730This function obtains the maximum allowed number of nested, active parallel regions.
731
732@item @emph{C/C++}
733@multitable @columnfractions .20 .80
734@item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
735@end multitable
736
737@item @emph{Fortran}:
738@multitable @columnfractions .20 .80
739@item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
740@end multitable
741
742@item @emph{See also}:
743@ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
744
745@item @emph{Reference}:
746@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
747@end table
748
749
750@node omp_get_max_task_priority
751@section @code{omp_get_max_task_priority} -- Maximum priority value
752that can be set for tasks.
753@table @asis
754@item @emph{Description}:
755This function obtains the maximum allowed priority number for tasks.
756
757@item @emph{C/C++}
758@multitable @columnfractions .20 .80
759@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
760@end multitable
761
762@item @emph{Fortran}:
763@multitable @columnfractions .20 .80
764@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
765@end multitable
766
767@item @emph{Reference}:
768@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
769@end table
770
771
772@node omp_get_max_teams
773@section @code{omp_get_max_teams} -- Maximum number of teams of teams region
774@table @asis
775@item @emph{Description}:
776Return the maximum number of teams used for the teams region
777that does not use the clause @code{num_teams}.
778
779@item @emph{C/C++}:
780@multitable @columnfractions .20 .80
781@item @emph{Prototype}: @tab @code{int omp_get_max_teams(void);}
782@end multitable
783
784@item @emph{Fortran}:
785@multitable @columnfractions .20 .80
786@item @emph{Interface}: @tab @code{integer function omp_get_max_teams()}
787@end multitable
788
789@item @emph{See also}:
790@ref{omp_set_num_teams}, @ref{omp_get_num_teams}
791
792@item @emph{Reference}:
793@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.4.
794@end table
795
796
797
798@node omp_get_max_threads
799@section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
800@table @asis
801@item @emph{Description}:
802Return the maximum number of threads used for the current parallel region
803that does not use the clause @code{num_threads}.
804
805@item @emph{C/C++}:
806@multitable @columnfractions .20 .80
807@item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
808@end multitable
809
810@item @emph{Fortran}:
811@multitable @columnfractions .20 .80
812@item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
813@end multitable
814
815@item @emph{See also}:
816@ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
817
818@item @emph{Reference}:
819@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
820@end table
821
822
823
824@node omp_get_nested
825@section @code{omp_get_nested} -- Nested parallel regions
826@table @asis
827@item @emph{Description}:
828This function returns @code{true} if nested parallel regions are
829enabled, @code{false} otherwise. Here, @code{true} and @code{false}
830represent their language-specific counterparts.
831
832The state of nested parallel regions at startup depends on several
833environment variables. If @env{OMP_MAX_ACTIVE_LEVELS} is defined
834and is set to greater than one, then nested parallel regions will be
835enabled. If not defined, then the value of the @env{OMP_NESTED}
836environment variable will be followed if defined. If neither are
837defined, then if either @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND}
838are defined with a list of more than one value, then nested parallel
839regions are enabled. If none of these are defined, then nested parallel
840regions are disabled by default.
841
842Nested parallel regions can be enabled or disabled at runtime using
843@code{omp_set_nested}, or by setting the maximum number of nested
844regions with @code{omp_set_max_active_levels} to one to disable, or
845above one to enable.
846
847@item @emph{C/C++}:
848@multitable @columnfractions .20 .80
849@item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
850@end multitable
851
852@item @emph{Fortran}:
853@multitable @columnfractions .20 .80
854@item @emph{Interface}: @tab @code{logical function omp_get_nested()}
855@end multitable
856
857@item @emph{See also}:
858@ref{omp_set_max_active_levels}, @ref{omp_set_nested},
859@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
860
861@item @emph{Reference}:
862@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
863@end table
864
865
866
867@node omp_get_num_devices
868@section @code{omp_get_num_devices} -- Number of target devices
869@table @asis
870@item @emph{Description}:
871Returns the number of target devices.
872
873@item @emph{C/C++}:
874@multitable @columnfractions .20 .80
875@item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
876@end multitable
877
878@item @emph{Fortran}:
879@multitable @columnfractions .20 .80
880@item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
881@end multitable
882
883@item @emph{Reference}:
884@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
885@end table
886
887
888
889@node omp_get_num_procs
890@section @code{omp_get_num_procs} -- Number of processors online
891@table @asis
892@item @emph{Description}:
893Returns the number of processors online on that device.
894
895@item @emph{C/C++}:
896@multitable @columnfractions .20 .80
897@item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
898@end multitable
899
900@item @emph{Fortran}:
901@multitable @columnfractions .20 .80
902@item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
903@end multitable
904
905@item @emph{Reference}:
906@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
907@end table
908
909
910
911@node omp_get_num_teams
912@section @code{omp_get_num_teams} -- Number of teams
913@table @asis
914@item @emph{Description}:
915Returns the number of teams in the current team region.
916
917@item @emph{C/C++}:
918@multitable @columnfractions .20 .80
919@item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
920@end multitable
921
922@item @emph{Fortran}:
923@multitable @columnfractions .20 .80
924@item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
925@end multitable
926
927@item @emph{Reference}:
928@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
929@end table
930
931
932
933@node omp_get_num_threads
934@section @code{omp_get_num_threads} -- Size of the active team
935@table @asis
936@item @emph{Description}:
937Returns the number of threads in the current team. In a sequential section of
938the program @code{omp_get_num_threads} returns 1.
939
940The default team size may be initialized at startup by the
941@env{OMP_NUM_THREADS} environment variable. At runtime, the size
942of the current team may be set either by the @code{NUM_THREADS}
943clause or by @code{omp_set_num_threads}. If none of the above were
944used to define a specific value and @env{OMP_DYNAMIC} is disabled,
945one thread per CPU online is used.
946
947@item @emph{C/C++}:
948@multitable @columnfractions .20 .80
949@item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
950@end multitable
951
952@item @emph{Fortran}:
953@multitable @columnfractions .20 .80
954@item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
955@end multitable
956
957@item @emph{See also}:
958@ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
959
960@item @emph{Reference}:
961@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
962@end table
963
964
965
966@node omp_get_proc_bind
967@section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs
968@table @asis
969@item @emph{Description}:
970This functions returns the currently active thread affinity policy, which is
971set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false},
972@code{omp_proc_bind_true}, @code{omp_proc_bind_primary},
973@code{omp_proc_bind_master}, @code{omp_proc_bind_close} and @code{omp_proc_bind_spread},
974where @code{omp_proc_bind_master} is an alias for @code{omp_proc_bind_primary}.
975
976@item @emph{C/C++}:
977@multitable @columnfractions .20 .80
978@item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
979@end multitable
980
981@item @emph{Fortran}:
982@multitable @columnfractions .20 .80
983@item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
984@end multitable
985
986@item @emph{See also}:
987@ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
988
989@item @emph{Reference}:
990@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
991@end table
992
993
994
995@node omp_get_schedule
996@section @code{omp_get_schedule} -- Obtain the runtime scheduling method
997@table @asis
998@item @emph{Description}:
999Obtain the runtime scheduling method. The @var{kind} argument will be
1000set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
1001@code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
1002@var{chunk_size}, is set to the chunk size.
1003
1004@item @emph{C/C++}
1005@multitable @columnfractions .20 .80
1006@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
1007@end multitable
1008
1009@item @emph{Fortran}:
1010@multitable @columnfractions .20 .80
1011@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
1012@item @tab @code{integer(kind=omp_sched_kind) kind}
1013@item @tab @code{integer chunk_size}
1014@end multitable
1015
1016@item @emph{See also}:
1017@ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
1018
1019@item @emph{Reference}:
1020@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
1021@end table
1022
1023
1024@node omp_get_supported_active_levels
1025@section @code{omp_get_supported_active_levels} -- Maximum number of active regions supported
1026@table @asis
1027@item @emph{Description}:
1028This function returns the maximum number of nested, active parallel regions
1029supported by this implementation.
1030
1031@item @emph{C/C++}
1032@multitable @columnfractions .20 .80
1033@item @emph{Prototype}: @tab @code{int omp_get_supported_active_levels(void);}
1034@end multitable
1035
1036@item @emph{Fortran}:
1037@multitable @columnfractions .20 .80
1038@item @emph{Interface}: @tab @code{integer function omp_get_supported_active_levels()}
1039@end multitable
1040
1041@item @emph{See also}:
1042@ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
1043
1044@item @emph{Reference}:
1045@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.15.
1046@end table
1047
1048
1049
1050@node omp_get_team_num
1051@section @code{omp_get_team_num} -- Get team number
1052@table @asis
1053@item @emph{Description}:
1054Returns the team number of the calling thread.
1055
1056@item @emph{C/C++}:
1057@multitable @columnfractions .20 .80
1058@item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
1059@end multitable
1060
1061@item @emph{Fortran}:
1062@multitable @columnfractions .20 .80
1063@item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
1064@end multitable
1065
1066@item @emph{Reference}:
1067@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
1068@end table
1069
1070
1071
1072@node omp_get_team_size
1073@section @code{omp_get_team_size} -- Number of threads in a team
1074@table @asis
1075@item @emph{Description}:
1076This function returns the number of threads in a thread team to which
1077either the current thread or its ancestor belongs. For values of @var{level}
1078outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
10791 is returned, and for @code{omp_get_level}, the result is identical
1080to @code{omp_get_num_threads}.
1081
1082@item @emph{C/C++}:
1083@multitable @columnfractions .20 .80
1084@item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
1085@end multitable
1086
1087@item @emph{Fortran}:
1088@multitable @columnfractions .20 .80
1089@item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
1090@item @tab @code{integer level}
1091@end multitable
1092
1093@item @emph{See also}:
1094@ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
1095
1096@item @emph{Reference}:
1097@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
1098@end table
1099
1100
1101
1102@node omp_get_teams_thread_limit
1103@section @code{omp_get_teams_thread_limit} -- Maximum number of threads imposed by teams
1104@table @asis
1105@item @emph{Description}:
1106Return the maximum number of threads that will be able to participate in
1107each team created by a teams construct.
1108
1109@item @emph{C/C++}:
1110@multitable @columnfractions .20 .80
1111@item @emph{Prototype}: @tab @code{int omp_get_teams_thread_limit(void);}
1112@end multitable
1113
1114@item @emph{Fortran}:
1115@multitable @columnfractions .20 .80
1116@item @emph{Interface}: @tab @code{integer function omp_get_teams_thread_limit()}
1117@end multitable
1118
1119@item @emph{See also}:
1120@ref{omp_set_teams_thread_limit}, @ref{OMP_TEAMS_THREAD_LIMIT}
1121
1122@item @emph{Reference}:
1123@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.6.
1124@end table
1125
1126
1127
1128@node omp_get_thread_limit
1129@section @code{omp_get_thread_limit} -- Maximum number of threads
1130@table @asis
1131@item @emph{Description}:
1132Return the maximum number of threads of the program.
1133
1134@item @emph{C/C++}:
1135@multitable @columnfractions .20 .80
1136@item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
1137@end multitable
1138
1139@item @emph{Fortran}:
1140@multitable @columnfractions .20 .80
1141@item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
1142@end multitable
1143
1144@item @emph{See also}:
1145@ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
1146
1147@item @emph{Reference}:
1148@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
1149@end table
1150
1151
1152
1153@node omp_get_thread_num
1154@section @code{omp_get_thread_num} -- Current thread ID
1155@table @asis
1156@item @emph{Description}:
1157Returns a unique thread identification number within the current team.
1158In a sequential parts of the program, @code{omp_get_thread_num}
1159always returns 0. In parallel regions the return value varies
1160from 0 to @code{omp_get_num_threads}-1 inclusive. The return
1161value of the primary thread of a team is always 0.
1162
1163@item @emph{C/C++}:
1164@multitable @columnfractions .20 .80
1165@item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
1166@end multitable
1167
1168@item @emph{Fortran}:
1169@multitable @columnfractions .20 .80
1170@item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
1171@end multitable
1172
1173@item @emph{See also}:
1174@ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
1175
1176@item @emph{Reference}:
1177@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
1178@end table
1179
1180
1181
1182@node omp_in_parallel
1183@section @code{omp_in_parallel} -- Whether a parallel region is active
1184@table @asis
1185@item @emph{Description}:
1186This function returns @code{true} if currently running in parallel,
1187@code{false} otherwise. Here, @code{true} and @code{false} represent
1188their language-specific counterparts.
1189
1190@item @emph{C/C++}:
1191@multitable @columnfractions .20 .80
1192@item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
1193@end multitable
1194
1195@item @emph{Fortran}:
1196@multitable @columnfractions .20 .80
1197@item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
1198@end multitable
1199
1200@item @emph{Reference}:
1201@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
1202@end table
1203
1204
1205@node omp_in_final
1206@section @code{omp_in_final} -- Whether in final or included task region
1207@table @asis
1208@item @emph{Description}:
1209This function returns @code{true} if currently running in a final
1210or included task region, @code{false} otherwise. Here, @code{true}
1211and @code{false} represent their language-specific counterparts.
1212
1213@item @emph{C/C++}:
1214@multitable @columnfractions .20 .80
1215@item @emph{Prototype}: @tab @code{int omp_in_final(void);}
1216@end multitable
1217
1218@item @emph{Fortran}:
1219@multitable @columnfractions .20 .80
1220@item @emph{Interface}: @tab @code{logical function omp_in_final()}
1221@end multitable
1222
1223@item @emph{Reference}:
1224@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
1225@end table
1226
1227
1228
1229@node omp_is_initial_device
1230@section @code{omp_is_initial_device} -- Whether executing on the host device
1231@table @asis
1232@item @emph{Description}:
1233This function returns @code{true} if currently running on the host device,
1234@code{false} otherwise. Here, @code{true} and @code{false} represent
1235their language-specific counterparts.
1236
1237@item @emph{C/C++}:
1238@multitable @columnfractions .20 .80
1239@item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
1240@end multitable
1241
1242@item @emph{Fortran}:
1243@multitable @columnfractions .20 .80
1244@item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
1245@end multitable
1246
1247@item @emph{Reference}:
1248@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
1249@end table
1250
1251
1252
1253@node omp_set_default_device
1254@section @code{omp_set_default_device} -- Set the default device for target regions
1255@table @asis
1256@item @emph{Description}:
1257Set the default device for target regions without device clause. The argument
1258shall be a nonnegative device number.
1259
1260@item @emph{C/C++}:
1261@multitable @columnfractions .20 .80
1262@item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
1263@end multitable
1264
1265@item @emph{Fortran}:
1266@multitable @columnfractions .20 .80
1267@item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
1268@item @tab @code{integer device_num}
1269@end multitable
1270
1271@item @emph{See also}:
1272@ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
1273
1274@item @emph{Reference}:
1275@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
1276@end table
1277
1278
1279
1280@node omp_set_dynamic
1281@section @code{omp_set_dynamic} -- Enable/disable dynamic teams
1282@table @asis
1283@item @emph{Description}:
1284Enable or disable the dynamic adjustment of the number of threads
1285within a team. The function takes the language-specific equivalent
1286of @code{true} and @code{false}, where @code{true} enables dynamic
1287adjustment of team sizes and @code{false} disables it.
1288
1289@item @emph{C/C++}:
1290@multitable @columnfractions .20 .80
1291@item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
1292@end multitable
1293
1294@item @emph{Fortran}:
1295@multitable @columnfractions .20 .80
1296@item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
1297@item @tab @code{logical, intent(in) :: dynamic_threads}
1298@end multitable
1299
1300@item @emph{See also}:
1301@ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
1302
1303@item @emph{Reference}:
1304@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
1305@end table
1306
1307
1308
1309@node omp_set_max_active_levels
1310@section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
1311@table @asis
1312@item @emph{Description}:
1313This function limits the maximum allowed number of nested, active
1314parallel regions. @var{max_levels} must be less or equal to
1315the value returned by @code{omp_get_supported_active_levels}.
1316
1317@item @emph{C/C++}
1318@multitable @columnfractions .20 .80
1319@item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
1320@end multitable
1321
1322@item @emph{Fortran}:
1323@multitable @columnfractions .20 .80
1324@item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
1325@item @tab @code{integer max_levels}
1326@end multitable
1327
1328@item @emph{See also}:
1329@ref{omp_get_max_active_levels}, @ref{omp_get_active_level},
1330@ref{omp_get_supported_active_levels}
1331
1332@item @emph{Reference}:
1333@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
1334@end table
1335
1336
1337
1338@node omp_set_nested
1339@section @code{omp_set_nested} -- Enable/disable nested parallel regions
1340@table @asis
1341@item @emph{Description}:
1342Enable or disable nested parallel regions, i.e., whether team members
1343are allowed to create new teams. The function takes the language-specific
1344equivalent of @code{true} and @code{false}, where @code{true} enables
1345dynamic adjustment of team sizes and @code{false} disables it.
1346
1347Enabling nested parallel regions will also set the maximum number of
1348active nested regions to the maximum supported. Disabling nested parallel
1349regions will set the maximum number of active nested regions to one.
1350
1351@item @emph{C/C++}:
1352@multitable @columnfractions .20 .80
1353@item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
1354@end multitable
1355
1356@item @emph{Fortran}:
1357@multitable @columnfractions .20 .80
1358@item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
1359@item @tab @code{logical, intent(in) :: nested}
1360@end multitable
1361
1362@item @emph{See also}:
1363@ref{omp_get_nested}, @ref{omp_set_max_active_levels},
1364@ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
1365
1366@item @emph{Reference}:
1367@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
1368@end table
1369
1370
1371
1372@node omp_set_num_teams
1373@section @code{omp_set_num_teams} -- Set upper teams limit for teams construct
1374@table @asis
1375@item @emph{Description}:
1376Specifies the upper bound for number of teams created by the teams construct
1377which does not specify a @code{num_teams} clause. The
1378argument of @code{omp_set_num_teams} shall be a positive integer.
1379
1380@item @emph{C/C++}:
1381@multitable @columnfractions .20 .80
1382@item @emph{Prototype}: @tab @code{void omp_set_num_teams(int num_teams);}
1383@end multitable
1384
1385@item @emph{Fortran}:
1386@multitable @columnfractions .20 .80
1387@item @emph{Interface}: @tab @code{subroutine omp_set_num_teams(num_teams)}
1388@item @tab @code{integer, intent(in) :: num_teams}
1389@end multitable
1390
1391@item @emph{See also}:
1392@ref{OMP_NUM_TEAMS}, @ref{omp_get_num_teams}, @ref{omp_get_max_teams}
1393
1394@item @emph{Reference}:
1395@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.3.
1396@end table
1397
1398
1399
1400@node omp_set_num_threads
1401@section @code{omp_set_num_threads} -- Set upper team size limit
1402@table @asis
1403@item @emph{Description}:
1404Specifies the number of threads used by default in subsequent parallel
1405sections, if those do not specify a @code{num_threads} clause. The
1406argument of @code{omp_set_num_threads} shall be a positive integer.
1407
1408@item @emph{C/C++}:
1409@multitable @columnfractions .20 .80
1410@item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
1411@end multitable
1412
1413@item @emph{Fortran}:
1414@multitable @columnfractions .20 .80
1415@item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
1416@item @tab @code{integer, intent(in) :: num_threads}
1417@end multitable
1418
1419@item @emph{See also}:
1420@ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
1421
1422@item @emph{Reference}:
1423@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
1424@end table
1425
1426
1427
1428@node omp_set_schedule
1429@section @code{omp_set_schedule} -- Set the runtime scheduling method
1430@table @asis
1431@item @emph{Description}:
1432Sets the runtime scheduling method. The @var{kind} argument can have the
1433value @code{omp_sched_static}, @code{omp_sched_dynamic},
1434@code{omp_sched_guided} or @code{omp_sched_auto}. Except for
1435@code{omp_sched_auto}, the chunk size is set to the value of
1436@var{chunk_size} if positive, or to the default value if zero or negative.
1437For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
1438
1439@item @emph{C/C++}
1440@multitable @columnfractions .20 .80
1441@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
1442@end multitable
1443
1444@item @emph{Fortran}:
1445@multitable @columnfractions .20 .80
1446@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
1447@item @tab @code{integer(kind=omp_sched_kind) kind}
1448@item @tab @code{integer chunk_size}
1449@end multitable
1450
1451@item @emph{See also}:
1452@ref{omp_get_schedule}
1453@ref{OMP_SCHEDULE}
1454
1455@item @emph{Reference}:
1456@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
1457@end table
1458
1459
1460
1461@node omp_set_teams_thread_limit
1462@section @code{omp_set_teams_thread_limit} -- Set upper thread limit for teams construct
1463@table @asis
1464@item @emph{Description}:
1465Specifies the upper bound for number of threads that will be available
1466for each team created by the teams construct which does not specify a
1467@code{thread_limit} clause. The argument of
1468@code{omp_set_teams_thread_limit} shall be a positive integer.
1469
1470@item @emph{C/C++}:
1471@multitable @columnfractions .20 .80
1472@item @emph{Prototype}: @tab @code{void omp_set_teams_thread_limit(int thread_limit);}
1473@end multitable
1474
1475@item @emph{Fortran}:
1476@multitable @columnfractions .20 .80
1477@item @emph{Interface}: @tab @code{subroutine omp_set_teams_thread_limit(thread_limit)}
1478@item @tab @code{integer, intent(in) :: thread_limit}
1479@end multitable
1480
1481@item @emph{See also}:
1482@ref{OMP_TEAMS_THREAD_LIMIT}, @ref{omp_get_teams_thread_limit}, @ref{omp_get_thread_limit}
1483
1484@item @emph{Reference}:
1485@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.5.
1486@end table
1487
1488
1489
1490@node omp_init_lock
1491@section @code{omp_init_lock} -- Initialize simple lock
1492@table @asis
1493@item @emph{Description}:
1494Initialize a simple lock. After initialization, the lock is in
1495an unlocked state.
1496
1497@item @emph{C/C++}:
1498@multitable @columnfractions .20 .80
1499@item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1500@end multitable
1501
1502@item @emph{Fortran}:
1503@multitable @columnfractions .20 .80
1504@item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1505@item @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1506@end multitable
1507
1508@item @emph{See also}:
1509@ref{omp_destroy_lock}
1510
1511@item @emph{Reference}:
1512@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1513@end table
1514
1515
1516
1517@node omp_set_lock
1518@section @code{omp_set_lock} -- Wait for and set simple lock
1519@table @asis
1520@item @emph{Description}:
1521Before setting a simple lock, the lock variable must be initialized by
1522@code{omp_init_lock}. The calling thread is blocked until the lock
1523is available. If the lock is already held by the current thread,
1524a deadlock occurs.
1525
1526@item @emph{C/C++}:
1527@multitable @columnfractions .20 .80
1528@item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
1529@end multitable
1530
1531@item @emph{Fortran}:
1532@multitable @columnfractions .20 .80
1533@item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
1534@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1535@end multitable
1536
1537@item @emph{See also}:
1538@ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
1539
1540@item @emph{Reference}:
1541@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1542@end table
1543
1544
1545
1546@node omp_test_lock
1547@section @code{omp_test_lock} -- Test and set simple lock if available
1548@table @asis
1549@item @emph{Description}:
1550Before setting a simple lock, the lock variable must be initialized by
1551@code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1552does not block if the lock is not available. This function returns
1553@code{true} upon success, @code{false} otherwise. Here, @code{true} and
1554@code{false} represent their language-specific counterparts.
1555
1556@item @emph{C/C++}:
1557@multitable @columnfractions .20 .80
1558@item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
1559@end multitable
1560
1561@item @emph{Fortran}:
1562@multitable @columnfractions .20 .80
1563@item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1564@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1565@end multitable
1566
1567@item @emph{See also}:
1568@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1569
1570@item @emph{Reference}:
1571@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1572@end table
1573
1574
1575
1576@node omp_unset_lock
1577@section @code{omp_unset_lock} -- Unset simple lock
1578@table @asis
1579@item @emph{Description}:
1580A simple lock about to be unset must have been locked by @code{omp_set_lock}
1581or @code{omp_test_lock} before. In addition, the lock must be held by the
1582thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one
1583or more threads attempted to set the lock before, one of them is chosen to,
1584again, set the lock to itself.
1585
1586@item @emph{C/C++}:
1587@multitable @columnfractions .20 .80
1588@item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
1589@end multitable
1590
1591@item @emph{Fortran}:
1592@multitable @columnfractions .20 .80
1593@item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1594@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1595@end multitable
1596
1597@item @emph{See also}:
1598@ref{omp_set_lock}, @ref{omp_test_lock}
1599
1600@item @emph{Reference}:
1601@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1602@end table
1603
1604
1605
1606@node omp_destroy_lock
1607@section @code{omp_destroy_lock} -- Destroy simple lock
1608@table @asis
1609@item @emph{Description}:
1610Destroy a simple lock. In order to be destroyed, a simple lock must be
1611in the unlocked state.
1612
1613@item @emph{C/C++}:
1614@multitable @columnfractions .20 .80
1615@item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
1616@end multitable
1617
1618@item @emph{Fortran}:
1619@multitable @columnfractions .20 .80
1620@item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
1621@item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1622@end multitable
1623
1624@item @emph{See also}:
1625@ref{omp_init_lock}
1626
1627@item @emph{Reference}:
1628@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1629@end table
1630
1631
1632
1633@node omp_init_nest_lock
1634@section @code{omp_init_nest_lock} -- Initialize nested lock
1635@table @asis
1636@item @emph{Description}:
1637Initialize a nested lock. After initialization, the lock is in
1638an unlocked state and the nesting count is set to zero.
1639
1640@item @emph{C/C++}:
1641@multitable @columnfractions .20 .80
1642@item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
1643@end multitable
1644
1645@item @emph{Fortran}:
1646@multitable @columnfractions .20 .80
1647@item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1648@item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
1649@end multitable
1650
1651@item @emph{See also}:
1652@ref{omp_destroy_nest_lock}
1653
1654@item @emph{Reference}:
1655@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1656@end table
1657
1658
1659@node omp_set_nest_lock
1660@section @code{omp_set_nest_lock} -- Wait for and set nested lock
1661@table @asis
1662@item @emph{Description}:
1663Before setting a nested lock, the lock variable must be initialized by
1664@code{omp_init_nest_lock}. The calling thread is blocked until the lock
1665is available. If the lock is already held by the current thread, the
1666nesting count for the lock is incremented.
1667
1668@item @emph{C/C++}:
1669@multitable @columnfractions .20 .80
1670@item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1671@end multitable
1672
1673@item @emph{Fortran}:
1674@multitable @columnfractions .20 .80
1675@item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1676@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1677@end multitable
1678
1679@item @emph{See also}:
1680@ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1681
1682@item @emph{Reference}:
1683@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1684@end table
1685
1686
1687
1688@node omp_test_nest_lock
1689@section @code{omp_test_nest_lock} -- Test and set nested lock if available
1690@table @asis
1691@item @emph{Description}:
1692Before setting a nested lock, the lock variable must be initialized by
1693@code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock},
1694@code{omp_test_nest_lock} does not block if the lock is not available.
1695If the lock is already held by the current thread, the new nesting count
1696is returned. Otherwise, the return value equals zero.
1697
1698@item @emph{C/C++}:
1699@multitable @columnfractions .20 .80
1700@item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1701@end multitable
1702
1703@item @emph{Fortran}:
1704@multitable @columnfractions .20 .80
1705@item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
1706@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1707@end multitable
1708
1709
1710@item @emph{See also}:
1711@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1712
1713@item @emph{Reference}:
1714@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1715@end table
1716
1717
1718
1719@node omp_unset_nest_lock
1720@section @code{omp_unset_nest_lock} -- Unset nested lock
1721@table @asis
1722@item @emph{Description}:
1723A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1724or @code{omp_test_nested_lock} before. In addition, the lock must be held by the
1725thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the
1726lock becomes unlocked. If one ore more threads attempted to set the lock before,
1727one of them is chosen to, again, set the lock to itself.
1728
1729@item @emph{C/C++}:
1730@multitable @columnfractions .20 .80
1731@item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1732@end multitable
1733
1734@item @emph{Fortran}:
1735@multitable @columnfractions .20 .80
1736@item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1737@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1738@end multitable
1739
1740@item @emph{See also}:
1741@ref{omp_set_nest_lock}
1742
1743@item @emph{Reference}:
1744@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1745@end table
1746
1747
1748
1749@node omp_destroy_nest_lock
1750@section @code{omp_destroy_nest_lock} -- Destroy nested lock
1751@table @asis
1752@item @emph{Description}:
1753Destroy a nested lock. In order to be destroyed, a nested lock must be
1754in the unlocked state and its nesting count must equal zero.
1755
1756@item @emph{C/C++}:
1757@multitable @columnfractions .20 .80
1758@item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
1759@end multitable
1760
1761@item @emph{Fortran}:
1762@multitable @columnfractions .20 .80
1763@item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1764@item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1765@end multitable
1766
1767@item @emph{See also}:
1768@ref{omp_init_lock}
1769
1770@item @emph{Reference}:
1771@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1772@end table
1773
1774
1775
1776@node omp_get_wtick
1777@section @code{omp_get_wtick} -- Get timer precision
1778@table @asis
1779@item @emph{Description}:
1780Gets the timer precision, i.e., the number of seconds between two
1781successive clock ticks.
1782
1783@item @emph{C/C++}:
1784@multitable @columnfractions .20 .80
1785@item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
1786@end multitable
1787
1788@item @emph{Fortran}:
1789@multitable @columnfractions .20 .80
1790@item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
1791@end multitable
1792
1793@item @emph{See also}:
1794@ref{omp_get_wtime}
1795
1796@item @emph{Reference}:
1797@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
1798@end table
1799
1800
1801
1802@node omp_get_wtime
1803@section @code{omp_get_wtime} -- Elapsed wall clock time
1804@table @asis
1805@item @emph{Description}:
1806Elapsed wall clock time in seconds. The time is measured per thread, no
1807guarantee can be made that two distinct threads measure the same time.
1808Time is measured from some "time in the past", which is an arbitrary time
1809guaranteed not to change during the execution of the program.
1810
1811@item @emph{C/C++}:
1812@multitable @columnfractions .20 .80
1813@item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
1814@end multitable
1815
1816@item @emph{Fortran}:
1817@multitable @columnfractions .20 .80
1818@item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
1819@end multitable
1820
1821@item @emph{See also}:
1822@ref{omp_get_wtick}
1823
1824@item @emph{Reference}:
1825@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
1826@end table
1827
1828
1829
1830@node omp_fulfill_event
1831@section @code{omp_fulfill_event} -- Fulfill and destroy an OpenMP event
1832@table @asis
1833@item @emph{Description}:
1834Fulfill the event associated with the event handle argument. Currently, it
1835is only used to fulfill events generated by detach clauses on task
1836constructs - the effect of fulfilling the event is to allow the task to
1837complete.
1838
1839The result of calling @code{omp_fulfill_event} with an event handle other
1840than that generated by a detach clause is undefined. Calling it with an
1841event handle that has already been fulfilled is also undefined.
1842
1843@item @emph{C/C++}:
1844@multitable @columnfractions .20 .80
1845@item @emph{Prototype}: @tab @code{void omp_fulfill_event(omp_event_handle_t event);}
1846@end multitable
1847
1848@item @emph{Fortran}:
1849@multitable @columnfractions .20 .80
1850@item @emph{Interface}: @tab @code{subroutine omp_fulfill_event(event)}
1851@item @tab @code{integer (kind=omp_event_handle_kind) :: event}
1852@end multitable
1853
1854@item @emph{Reference}:
1855@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.5.1.
1856@end table
1857
1858
1859
1860@c ---------------------------------------------------------------------
1861@c OpenMP Environment Variables
1862@c ---------------------------------------------------------------------
1863
1864@node Environment Variables
1865@chapter OpenMP Environment Variables
1866
1867The environment variables which beginning with @env{OMP_} are defined by
1868section 4 of the OpenMP specification in version 4.5, while those
1869beginning with @env{GOMP_} are GNU extensions.
1870
1871@menu
1872* OMP_CANCELLATION:: Set whether cancellation is activated
1873* OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
1874* OMP_DEFAULT_DEVICE:: Set the device used in target regions
1875* OMP_DYNAMIC:: Dynamic adjustment of threads
1876* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
1877* OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
1878* OMP_NESTED:: Nested parallel regions
1879* OMP_NUM_TEAMS:: Specifies the number of teams to use by teams region
1880* OMP_NUM_THREADS:: Specifies the number of threads to use
1881* OMP_PROC_BIND:: Whether theads may be moved between CPUs
1882* OMP_PLACES:: Specifies on which CPUs the theads should be placed
1883* OMP_STACKSIZE:: Set default thread stack size
1884* OMP_SCHEDULE:: How threads are scheduled
1885* OMP_TARGET_OFFLOAD:: Controls offloading behaviour
1886* OMP_TEAMS_THREAD_LIMIT:: Set the maximum number of threads imposed by teams
1887* OMP_THREAD_LIMIT:: Set the maximum number of threads
1888* OMP_WAIT_POLICY:: How waiting threads are handled
1889* GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
1890* GOMP_DEBUG:: Enable debugging output
1891* GOMP_STACKSIZE:: Set default thread stack size
1892* GOMP_SPINCOUNT:: Set the busy-wait spin count
1893* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
1894@end menu
1895
1896
1897@node OMP_CANCELLATION
1898@section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
1899@cindex Environment Variable
1900@table @asis
1901@item @emph{Description}:
1902If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or
1903if unset, cancellation is disabled and the @code{cancel} construct is ignored.
1904
1905@item @emph{See also}:
1906@ref{omp_get_cancellation}
1907
1908@item @emph{Reference}:
1909@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
1910@end table
1911
1912
1913
1914@node OMP_DISPLAY_ENV
1915@section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
1916@cindex Environment Variable
1917@table @asis
1918@item @emph{Description}:
1919If set to @code{TRUE}, the OpenMP version number and the values
1920associated with the OpenMP environment variables are printed to @code{stderr}.
1921If set to @code{VERBOSE}, it additionally shows the value of the environment
1922variables which are GNU extensions. If undefined or set to @code{FALSE},
1923this information will not be shown.
1924
1925
1926@item @emph{Reference}:
1927@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
1928@end table
1929
1930
1931
1932@node OMP_DEFAULT_DEVICE
1933@section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
1934@cindex Environment Variable
1935@table @asis
1936@item @emph{Description}:
1937Set to choose the device which is used in a @code{target} region, unless the
1938value is overridden by @code{omp_set_default_device} or by a @code{device}
1939clause. The value shall be the nonnegative device number. If no device with
1940the given device number exists, the code is executed on the host. If unset,
1941device number 0 will be used.
1942
1943
1944@item @emph{See also}:
1945@ref{omp_get_default_device}, @ref{omp_set_default_device},
1946
1947@item @emph{Reference}:
1948@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
1949@end table
1950
1951
1952
1953@node OMP_DYNAMIC
1954@section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
1955@cindex Environment Variable
1956@table @asis
1957@item @emph{Description}:
1958Enable or disable the dynamic adjustment of the number of threads
1959within a team. The value of this environment variable shall be
1960@code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is
1961disabled by default.
1962
1963@item @emph{See also}:
1964@ref{omp_set_dynamic}
1965
1966@item @emph{Reference}:
1967@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
1968@end table
1969
1970
1971
1972@node OMP_MAX_ACTIVE_LEVELS
1973@section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
1974@cindex Environment Variable
1975@table @asis
1976@item @emph{Description}:
1977Specifies the initial value for the maximum number of nested parallel
1978regions. The value of this variable shall be a positive integer.
1979If undefined, then if @env{OMP_NESTED} is defined and set to true, or
1980if @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND} are defined and set to
1981a list with more than one item, the maximum number of nested parallel
1982regions will be initialized to the largest number supported, otherwise
1983it will be set to one.
1984
1985@item @emph{See also}:
1986@ref{omp_set_max_active_levels}, @ref{OMP_NESTED}
1987
1988@item @emph{Reference}:
1989@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
1990@end table
1991
1992
1993
1994@node OMP_MAX_TASK_PRIORITY
1995@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
1996number that can be set for a task.
1997@cindex Environment Variable
1998@table @asis
1999@item @emph{Description}:
2000Specifies the initial value for the maximum priority value that can be
2001set for a task. The value of this variable shall be a non-negative
2002integer, and zero is allowed. If undefined, the default priority is
20030.
2004
2005@item @emph{See also}:
2006@ref{omp_get_max_task_priority}
2007
2008@item @emph{Reference}:
2009@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
2010@end table
2011
2012
2013
2014@node OMP_NESTED
2015@section @env{OMP_NESTED} -- Nested parallel regions
2016@cindex Environment Variable
2017@cindex Implementation specific setting
2018@table @asis
2019@item @emph{Description}:
2020Enable or disable nested parallel regions, i.e., whether team members
2021are allowed to create new teams. The value of this environment variable
2022shall be @code{TRUE} or @code{FALSE}. If set to @code{TRUE}, the number
2023of maximum active nested regions supported will by default be set to the
2024maximum supported, otherwise it will be set to one. If
2025@env{OMP_MAX_ACTIVE_LEVELS} is defined, its setting will override this
2026setting. If both are undefined, nested parallel regions are enabled if
2027@env{OMP_NUM_THREADS} or @env{OMP_PROC_BINDS} are defined to a list with
2028more than one item, otherwise they are disabled by default.
2029
2030@item @emph{See also}:
2031@ref{omp_set_max_active_levels}, @ref{omp_set_nested}
2032
2033@item @emph{Reference}:
2034@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
2035@end table
2036
2037
2038
2039@node OMP_NUM_TEAMS
2040@section @env{OMP_NUM_TEAMS} -- Specifies the number of teams to use by teams region
2041@cindex Environment Variable
2042@table @asis
2043@item @emph{Description}:
2044Specifies the upper bound for number of teams to use in teams regions
2045without explicit @code{num_teams} clause. The value of this variable shall
2046be a positive integer. If undefined it defaults to 0 which means
2047implementation defined upper bound.
2048
2049@item @emph{See also}:
2050@ref{omp_set_num_teams}
2051
2052@item @emph{Reference}:
2053@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.23
2054@end table
2055
2056
2057
2058@node OMP_NUM_THREADS
2059@section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
2060@cindex Environment Variable
2061@cindex Implementation specific setting
2062@table @asis
2063@item @emph{Description}:
2064Specifies the default number of threads to use in parallel regions. The
2065value of this variable shall be a comma-separated list of positive integers;
2066the value specifies the number of threads to use for the corresponding nested
2067level. Specifying more than one item in the list will automatically enable
2068nesting by default. If undefined one thread per CPU is used.
2069
2070@item @emph{See also}:
2071@ref{omp_set_num_threads}, @ref{OMP_NESTED}
2072
2073@item @emph{Reference}:
2074@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
2075@end table
2076
2077
2078
2079@node OMP_PROC_BIND
2080@section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
2081@cindex Environment Variable
2082@table @asis
2083@item @emph{Description}:
2084Specifies whether threads may be moved between processors. If set to
2085@code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
2086they may be moved. Alternatively, a comma separated list with the
2087values @code{PRIMARY}, @code{MASTER}, @code{CLOSE} and @code{SPREAD} can
2088be used to specify the thread affinity policy for the corresponding nesting
2089level. With @code{PRIMARY} and @code{MASTER} the worker threads are in the
2090same place partition as the primary thread. With @code{CLOSE} those are
2091kept close to the primary thread in contiguous place partitions. And
2092with @code{SPREAD} a sparse distribution
2093across the place partitions is used. Specifying more than one item in the
2094list will automatically enable nesting by default.
2095
2096When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
2097@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
2098
2099@item @emph{See also}:
2100@ref{omp_get_proc_bind}, @ref{GOMP_CPU_AFFINITY},
2101@ref{OMP_NESTED}, @ref{OMP_PLACES}
2102
2103@item @emph{Reference}:
2104@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
2105@end table
2106
2107
2108
2109@node OMP_PLACES
2110@section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed
2111@cindex Environment Variable
2112@table @asis
2113@item @emph{Description}:
2114The thread placement can be either specified using an abstract name or by an
2115explicit list of the places. The abstract names @code{threads}, @code{cores},
2116@code{sockets}, @code{ll_caches} and @code{numa_domains} can be optionally
2117followed by a positive number in parentheses, which denotes the how many places
2118shall be created. With @code{threads} each place corresponds to a single
2119hardware thread; @code{cores} to a single core with the corresponding number of
2120hardware threads; with @code{sockets} the place corresponds to a single
2121socket; with @code{ll_caches} to a set of cores that shares the last level
2122cache on the device; and @code{numa_domains} to a set of cores for which their
2123closest memory on the device is the same memory and at a similar distance from
2124the cores. The resulting placement can be shown by setting the
2125@env{OMP_DISPLAY_ENV} environment variable.
2126
2127Alternatively, the placement can be specified explicitly as comma-separated
2128list of places. A place is specified by set of nonnegative numbers in curly
2129braces, denoting the hardware threads. The curly braces can be omitted
2130when only a single number has been specified. The hardware threads
2131belonging to a place can either be specified as comma-separated list of
2132nonnegative thread numbers or using an interval. Multiple places can also be
2133either specified by a comma-separated list of places or by an interval. To
2134specify an interval, a colon followed by the count is placed after
2135the hardware thread number or the place. Optionally, the length can be
2136followed by a colon and the stride number -- otherwise a unit stride is
2137assumed. Placing an exclamation mark (@code{!}) directly before a curly
2138brace or numbers inside the curly braces (excluding intervals) will
2139exclude those hardware threads.
2140
2141For instance, the following specifies the same places list:
2142@code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
2143@code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
2144
2145If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
2146@env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
2147between CPUs following no placement policy.
2148
2149@item @emph{See also}:
2150@ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
2151@ref{OMP_DISPLAY_ENV}
2152
2153@item @emph{Reference}:
2154@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
2155@end table
2156
2157
2158
2159@node OMP_STACKSIZE
2160@section @env{OMP_STACKSIZE} -- Set default thread stack size
2161@cindex Environment Variable
2162@table @asis
2163@item @emph{Description}:
2164Set the default thread stack size in kilobytes, unless the number
2165is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
2166case the size is, respectively, in bytes, kilobytes, megabytes
2167or gigabytes. This is different from @code{pthread_attr_setstacksize}
2168which gets the number of bytes as an argument. If the stack size cannot
2169be set due to system constraints, an error is reported and the initial
2170stack size is left unchanged. If undefined, the stack size is system
2171dependent.
2172
2173@item @emph{Reference}:
2174@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
2175@end table
2176
2177
2178
2179@node OMP_SCHEDULE
2180@section @env{OMP_SCHEDULE} -- How threads are scheduled
2181@cindex Environment Variable
2182@cindex Implementation specific setting
2183@table @asis
2184@item @emph{Description}:
2185Allows to specify @code{schedule type} and @code{chunk size}.
2186The value of the variable shall have the form: @code{type[,chunk]} where
2187@code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
2188The optional @code{chunk} size shall be a positive integer. If undefined,
2189dynamic scheduling and a chunk size of 1 is used.
2190
2191@item @emph{See also}:
2192@ref{omp_set_schedule}
2193
2194@item @emph{Reference}:
2195@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
2196@end table
2197
2198
2199
2200@node OMP_TARGET_OFFLOAD
2201@section @env{OMP_TARGET_OFFLOAD} -- Controls offloading behaviour
2202@cindex Environment Variable
2203@cindex Implementation specific setting
2204@table @asis
2205@item @emph{Description}:
2206Specifies the behaviour with regard to offloading code to a device. This
2207variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED}
2208or @code{DEFAULT}.
2209
2210If set to @code{MANDATORY}, the program will terminate with an error if
2211the offload device is not present or is not supported. If set to
2212@code{DISABLED}, then offloading is disabled and all code will run on the
2213host. If set to @code{DEFAULT}, the program will try offloading to the
2214device first, then fall back to running code on the host if it cannot.
2215
2216If undefined, then the program will behave as if @code{DEFAULT} was set.
2217
2218@item @emph{Reference}:
2219@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.17
2220@end table
2221
2222
2223
2224@node OMP_TEAMS_THREAD_LIMIT
2225@section @env{OMP_TEAMS_THREAD_LIMIT} -- Set the maximum number of threads imposed by teams
2226@cindex Environment Variable
2227@table @asis
2228@item @emph{Description}:
2229Specifies an upper bound for the number of threads to use by each contention
2230group created by a teams construct without explicit @code{thread_limit}
2231clause. The value of this variable shall be a positive integer. If undefined,
2232the value of 0 is used which stands for an implementation defined upper
2233limit.
2234
2235@item @emph{See also}:
2236@ref{OMP_THREAD_LIMIT}, @ref{omp_set_teams_thread_limit}
2237
2238@item @emph{Reference}:
2239@uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.24
2240@end table
2241
2242
2243
2244@node OMP_THREAD_LIMIT
2245@section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
2246@cindex Environment Variable
2247@table @asis
2248@item @emph{Description}:
2249Specifies the number of threads to use for the whole program. The
2250value of this variable shall be a positive integer. If undefined,
2251the number of threads is not limited.
2252
2253@item @emph{See also}:
2254@ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
2255
2256@item @emph{Reference}:
2257@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
2258@end table
2259
2260
2261
2262@node OMP_WAIT_POLICY
2263@section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
2264@cindex Environment Variable
2265@table @asis
2266@item @emph{Description}:
2267Specifies whether waiting threads should be active or passive. If
2268the value is @code{PASSIVE}, waiting threads should not consume CPU
2269power while waiting; while the value is @code{ACTIVE} specifies that
2270they should. If undefined, threads wait actively for a short time
2271before waiting passively.
2272
2273@item @emph{See also}:
2274@ref{GOMP_SPINCOUNT}
2275
2276@item @emph{Reference}:
2277@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
2278@end table
2279
2280
2281
2282@node GOMP_CPU_AFFINITY
2283@section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
2284@cindex Environment Variable
2285@table @asis
2286@item @emph{Description}:
2287Binds threads to specific CPUs. The variable should contain a space-separated
2288or comma-separated list of CPUs. This list may contain different kinds of
2289entries: either single CPU numbers in any order, a range of CPUs (M-N)
2290or a range with some stride (M-N:S). CPU numbers are zero based. For example,
2291@code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
2292to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
2293CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
2294and 14 respectively and then start assigning back from the beginning of
2295the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
2296
2297There is no libgomp library routine to determine whether a CPU affinity
2298specification is in effect. As a workaround, language-specific library
2299functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
2300Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
2301environment variable. A defined CPU affinity on startup cannot be changed
2302or disabled during the runtime of the application.
2303
2304If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
2305@env{OMP_PROC_BIND} has a higher precedence. If neither has been set and
2306@env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
2307@code{FALSE}, the host system will handle the assignment of threads to CPUs.
2308
2309@item @emph{See also}:
2310@ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
2311@end table
2312
2313
2314
2315@node GOMP_DEBUG
2316@section @env{GOMP_DEBUG} -- Enable debugging output
2317@cindex Environment Variable
2318@table @asis
2319@item @emph{Description}:
2320Enable debugging output. The variable should be set to @code{0}
2321(disabled, also the default if not set), or @code{1} (enabled).
2322
2323If enabled, some debugging output will be printed during execution.
2324This is currently not specified in more detail, and subject to change.
2325@end table
2326
2327
2328
2329@node GOMP_STACKSIZE
2330@section @env{GOMP_STACKSIZE} -- Set default thread stack size
2331@cindex Environment Variable
2332@cindex Implementation specific setting
2333@table @asis
2334@item @emph{Description}:
2335Set the default thread stack size in kilobytes. This is different from
2336@code{pthread_attr_setstacksize} which gets the number of bytes as an
2337argument. If the stack size cannot be set due to system constraints, an
2338error is reported and the initial stack size is left unchanged. If undefined,
2339the stack size is system dependent.
2340
2341@item @emph{See also}:
2342@ref{OMP_STACKSIZE}
2343
2344@item @emph{Reference}:
2345@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
2346GCC Patches Mailinglist},
2347@uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
2348GCC Patches Mailinglist}
2349@end table
2350
2351
2352
2353@node GOMP_SPINCOUNT
2354@section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
2355@cindex Environment Variable
2356@cindex Implementation specific setting
2357@table @asis
2358@item @emph{Description}:
2359Determines how long a threads waits actively with consuming CPU power
2360before waiting passively without consuming CPU power. The value may be
2361either @code{INFINITE}, @code{INFINITY} to always wait actively or an
2362integer which gives the number of spins of the busy-wait loop. The
2363integer may optionally be followed by the following suffixes acting
2364as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
2365million), @code{G} (giga, billion), or @code{T} (tera, trillion).
2366If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
2367300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
236830 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
2369If there are more OpenMP threads than available CPUs, 1000 and 100
2370spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
2371undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
2372or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
2373
2374@item @emph{See also}:
2375@ref{OMP_WAIT_POLICY}
2376@end table
2377
2378
2379
2380@node GOMP_RTEMS_THREAD_POOLS
2381@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
2382@cindex Environment Variable
2383@cindex Implementation specific setting
2384@table @asis
2385@item @emph{Description}:
2386This environment variable is only used on the RTEMS real-time operating system.
2387It determines the scheduler instance specific thread pools. The format for
2388@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
2389@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
2390separated by @code{:} where:
2391@itemize @bullet
2392@item @code{<thread-pool-count>} is the thread pool count for this scheduler
2393instance.
2394@item @code{$<priority>} is an optional priority for the worker threads of a
2395thread pool according to @code{pthread_setschedparam}. In case a priority
2396value is omitted, then a worker thread will inherit the priority of the OpenMP
2397primary thread that created it. The priority of the worker thread is not
2398changed after creation, even if a new OpenMP primary thread using the worker has
2399a different priority.
2400@item @code{@@<scheduler-name>} is the scheduler instance name according to the
2401RTEMS application configuration.
2402@end itemize
2403In case no thread pool configuration is specified for a scheduler instance,
2404then each OpenMP primary thread of this scheduler instance will use its own
2405dynamically allocated thread pool. To limit the worker thread count of the
2406thread pools, each OpenMP primary thread must call @code{omp_set_num_threads}.
2407@item @emph{Example}:
2408Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
2409@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
2410@code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
2411scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
2412one thread pool available. Since no priority is specified for this scheduler
2413instance, the worker thread inherits the priority of the OpenMP primary thread
2414that created it. In the scheduler instance @code{WRK1} there are three thread
2415pools available and their worker threads run at priority four.
2416@end table
2417
2418
2419
2420@c ---------------------------------------------------------------------
2421@c Enabling OpenACC
2422@c ---------------------------------------------------------------------
2423
2424@node Enabling OpenACC
2425@chapter Enabling OpenACC
2426
2427To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
2428flag @option{-fopenacc} must be specified. This enables the OpenACC directive
2429@code{#pragma acc} in C/C++ and @code{!$acc} directives in free form,
2430@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
2431@code{!$} conditional compilation sentinels in free form and @code{c$},
2432@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
2433arranges for automatic linking of the OpenACC runtime library
2434(@ref{OpenACC Runtime Library Routines}).
2435
2436See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
2437
2438A complete description of all OpenACC directives accepted may be found in
2439the @uref{https://www.openacc.org, OpenACC} Application Programming
2440Interface manual, version 2.6.
2441
2442
2443
2444@c ---------------------------------------------------------------------
2445@c OpenACC Runtime Library Routines
2446@c ---------------------------------------------------------------------
2447
2448@node OpenACC Runtime Library Routines
2449@chapter OpenACC Runtime Library Routines
2450
2451The runtime routines described here are defined by section 3 of the OpenACC
2452specifications in version 2.6.
2453They have C linkage, and do not throw exceptions.
2454Generally, they are available only for the host, with the exception of
2455@code{acc_on_device}, which is available for both the host and the
2456acceleration device.
2457
2458@menu
2459* acc_get_num_devices:: Get number of devices for the given device
2460 type.
2461* acc_set_device_type:: Set type of device accelerator to use.
2462* acc_get_device_type:: Get type of device accelerator to be used.
2463* acc_set_device_num:: Set device number to use.
2464* acc_get_device_num:: Get device number to be used.
2465* acc_get_property:: Get device property.
2466* acc_async_test:: Tests for completion of a specific asynchronous
2467 operation.
2468* acc_async_test_all:: Tests for completion of all asynchronous
2469 operations.
2470* acc_wait:: Wait for completion of a specific asynchronous
2471 operation.
2472* acc_wait_all:: Waits for completion of all asynchronous
2473 operations.
2474* acc_wait_all_async:: Wait for completion of all asynchronous
2475 operations.
2476* acc_wait_async:: Wait for completion of asynchronous operations.
2477* acc_init:: Initialize runtime for a specific device type.
2478* acc_shutdown:: Shuts down the runtime for a specific device
2479 type.
2480* acc_on_device:: Whether executing on a particular device
2481* acc_malloc:: Allocate device memory.
2482* acc_free:: Free device memory.
2483* acc_copyin:: Allocate device memory and copy host memory to
2484 it.
2485* acc_present_or_copyin:: If the data is not present on the device,
2486 allocate device memory and copy from host
2487 memory.
2488* acc_create:: Allocate device memory and map it to host
2489 memory.
2490* acc_present_or_create:: If the data is not present on the device,
2491 allocate device memory and map it to host
2492 memory.
2493* acc_copyout:: Copy device memory to host memory.
2494* acc_delete:: Free device memory.
2495* acc_update_device:: Update device memory from mapped host memory.
2496* acc_update_self:: Update host memory from mapped device memory.
2497* acc_map_data:: Map previously allocated device memory to host
2498 memory.
2499* acc_unmap_data:: Unmap device memory from host memory.
2500* acc_deviceptr:: Get device pointer associated with specific
2501 host address.
2502* acc_hostptr:: Get host pointer associated with specific
2503 device address.
2504* acc_is_present:: Indicate whether host variable / array is
2505 present on device.
2506* acc_memcpy_to_device:: Copy host memory to device memory.
2507* acc_memcpy_from_device:: Copy device memory to host memory.
2508* acc_attach:: Let device pointer point to device-pointer target.
2509* acc_detach:: Let device pointer point to host-pointer target.
2510
2511API routines for target platforms.
2512
2513* acc_get_current_cuda_device:: Get CUDA device handle.
2514* acc_get_current_cuda_context::Get CUDA context handle.
2515* acc_get_cuda_stream:: Get CUDA stream handle.
2516* acc_set_cuda_stream:: Set CUDA stream handle.
2517
2518API routines for the OpenACC Profiling Interface.
2519
2520* acc_prof_register:: Register callbacks.
2521* acc_prof_unregister:: Unregister callbacks.
2522* acc_prof_lookup:: Obtain inquiry functions.
2523* acc_register_library:: Library registration.
2524@end menu
2525
2526
2527
2528@node acc_get_num_devices
2529@section @code{acc_get_num_devices} -- Get number of devices for given device type
2530@table @asis
2531@item @emph{Description}
2532This function returns a value indicating the number of devices available
2533for the device type specified in @var{devicetype}.
2534
2535@item @emph{C/C++}:
2536@multitable @columnfractions .20 .80
2537@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
2538@end multitable
2539
2540@item @emph{Fortran}:
2541@multitable @columnfractions .20 .80
2542@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
2543@item @tab @code{integer(kind=acc_device_kind) devicetype}
2544@end multitable
2545
2546@item @emph{Reference}:
2547@uref{https://www.openacc.org, OpenACC specification v2.6}, section
25483.2.1.
2549@end table
2550
2551
2552
2553@node acc_set_device_type
2554@section @code{acc_set_device_type} -- Set type of device accelerator to use.
2555@table @asis
2556@item @emph{Description}
2557This function indicates to the runtime library which device type, specified
2558in @var{devicetype}, to use when executing a parallel or kernels region.
2559
2560@item @emph{C/C++}:
2561@multitable @columnfractions .20 .80
2562@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
2563@end multitable
2564
2565@item @emph{Fortran}:
2566@multitable @columnfractions .20 .80
2567@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
2568@item @tab @code{integer(kind=acc_device_kind) devicetype}
2569@end multitable
2570
2571@item @emph{Reference}:
2572@uref{https://www.openacc.org, OpenACC specification v2.6}, section
25733.2.2.
2574@end table
2575
2576
2577
2578@node acc_get_device_type
2579@section @code{acc_get_device_type} -- Get type of device accelerator to be used.
2580@table @asis
2581@item @emph{Description}
2582This function returns what device type will be used when executing a
2583parallel or kernels region.
2584
2585This function returns @code{acc_device_none} if
2586@code{acc_get_device_type} is called from
2587@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
2588callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling
2589Interface}), that is, if the device is currently being initialized.
2590
2591@item @emph{C/C++}:
2592@multitable @columnfractions .20 .80
2593@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
2594@end multitable
2595
2596@item @emph{Fortran}:
2597@multitable @columnfractions .20 .80
2598@item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
2599@item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
2600@end multitable
2601
2602@item @emph{Reference}:
2603@uref{https://www.openacc.org, OpenACC specification v2.6}, section
26043.2.3.
2605@end table
2606
2607
2608
2609@node acc_set_device_num
2610@section @code{acc_set_device_num} -- Set device number to use.
2611@table @asis
2612@item @emph{Description}
2613This function will indicate to the runtime which device number,
2614specified by @var{devicenum}, associated with the specified device
2615type @var{devicetype}.
2616
2617@item @emph{C/C++}:
2618@multitable @columnfractions .20 .80
2619@item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);}
2620@end multitable
2621
2622@item @emph{Fortran}:
2623@multitable @columnfractions .20 .80
2624@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
2625@item @tab @code{integer devicenum}
2626@item @tab @code{integer(kind=acc_device_kind) devicetype}
2627@end multitable
2628
2629@item @emph{Reference}:
2630@uref{https://www.openacc.org, OpenACC specification v2.6}, section
26313.2.4.
2632@end table
2633
2634
2635
2636@node acc_get_device_num
2637@section @code{acc_get_device_num} -- Get device number to be used.
2638@table @asis
2639@item @emph{Description}
2640This function returns which device number associated with the specified device
2641type @var{devicetype}, will be used when executing a parallel or kernels
2642region.
2643
2644@item @emph{C/C++}:
2645@multitable @columnfractions .20 .80
2646@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
2647@end multitable
2648
2649@item @emph{Fortran}:
2650@multitable @columnfractions .20 .80
2651@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
2652@item @tab @code{integer(kind=acc_device_kind) devicetype}
2653@item @tab @code{integer acc_get_device_num}
2654@end multitable
2655
2656@item @emph{Reference}:
2657@uref{https://www.openacc.org, OpenACC specification v2.6}, section
26583.2.5.
2659@end table
2660
2661
2662
2663@node acc_get_property
2664@section @code{acc_get_property} -- Get device property.
2665@cindex acc_get_property
2666@cindex acc_get_property_string
2667@table @asis
2668@item @emph{Description}
2669These routines return the value of the specified @var{property} for the
2670device being queried according to @var{devicenum} and @var{devicetype}.
2671Integer-valued and string-valued properties are returned by
2672@code{acc_get_property} and @code{acc_get_property_string} respectively.
2673The Fortran @code{acc_get_property_string} subroutine returns the string
2674retrieved in its fourth argument while the remaining entry points are
2675functions, which pass the return value as their result.
2676
2677Note for Fortran, only: the OpenACC technical committee corrected and, hence,
2678modified the interface introduced in OpenACC 2.6. The kind-value parameter
2679@code{acc_device_property} has been renamed to @code{acc_device_property_kind}
2680for consistency and the return type of the @code{acc_get_property} function is
2681now a @code{c_size_t} integer instead of a @code{acc_device_property} integer.
2682The parameter @code{acc_device_property} will continue to be provided,
2683but might be removed in a future version of GCC.
2684
2685@item @emph{C/C++}:
2686@multitable @columnfractions .20 .80
2687@item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
2688@item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
2689@end multitable
2690
2691@item @emph{Fortran}:
2692@multitable @columnfractions .20 .80
2693@item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)}
2694@item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)}
2695@item @tab @code{use ISO_C_Binding, only: c_size_t}
2696@item @tab @code{integer devicenum}
2697@item @tab @code{integer(kind=acc_device_kind) devicetype}
2698@item @tab @code{integer(kind=acc_device_property_kind) property}
2699@item @tab @code{integer(kind=c_size_t) acc_get_property}
2700@item @tab @code{character(*) string}
2701@end multitable
2702
2703@item @emph{Reference}:
2704@uref{https://www.openacc.org, OpenACC specification v2.6}, section
27053.2.6.
2706@end table
2707
2708
2709
2710@node acc_async_test
2711@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
2712@table @asis
2713@item @emph{Description}
2714This function tests for completion of the asynchronous operation specified
2715in @var{arg}. In C/C++, a non-zero value will be returned to indicate
2716the specified asynchronous operation has completed. While Fortran will return
2717a @code{true}. If the asynchronous operation has not completed, C/C++ returns
2718a zero and Fortran returns a @code{false}.
2719
2720@item @emph{C/C++}:
2721@multitable @columnfractions .20 .80
2722@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
2723@end multitable
2724
2725@item @emph{Fortran}:
2726@multitable @columnfractions .20 .80
2727@item @emph{Interface}: @tab @code{function acc_async_test(arg)}
2728@item @tab @code{integer(kind=acc_handle_kind) arg}
2729@item @tab @code{logical acc_async_test}
2730@end multitable
2731
2732@item @emph{Reference}:
2733@uref{https://www.openacc.org, OpenACC specification v2.6}, section
27343.2.9.
2735@end table
2736
2737
2738
2739@node acc_async_test_all
2740@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
2741@table @asis
2742@item @emph{Description}
2743This function tests for completion of all asynchronous operations.
2744In C/C++, a non-zero value will be returned to indicate all asynchronous
2745operations have completed. While Fortran will return a @code{true}. If
2746any asynchronous operation has not completed, C/C++ returns a zero and
2747Fortran returns a @code{false}.
2748
2749@item @emph{C/C++}:
2750@multitable @columnfractions .20 .80
2751@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
2752@end multitable
2753
2754@item @emph{Fortran}:
2755@multitable @columnfractions .20 .80
2756@item @emph{Interface}: @tab @code{function acc_async_test()}
2757@item @tab @code{logical acc_get_device_num}
2758@end multitable
2759
2760@item @emph{Reference}:
2761@uref{https://www.openacc.org, OpenACC specification v2.6}, section
27623.2.10.
2763@end table
2764
2765
2766
2767@node acc_wait
2768@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
2769@table @asis
2770@item @emph{Description}
2771This function waits for completion of the asynchronous operation
2772specified in @var{arg}.
2773
2774@item @emph{C/C++}:
2775@multitable @columnfractions .20 .80
2776@item @emph{Prototype}: @tab @code{acc_wait(arg);}
2777@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
2778@end multitable
2779
2780@item @emph{Fortran}:
2781@multitable @columnfractions .20 .80
2782@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
2783@item @tab @code{integer(acc_handle_kind) arg}
2784@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
2785@item @tab @code{integer(acc_handle_kind) arg}
2786@end multitable
2787
2788@item @emph{Reference}:
2789@uref{https://www.openacc.org, OpenACC specification v2.6}, section
27903.2.11.
2791@end table
2792
2793
2794
2795@node acc_wait_all
2796@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
2797@table @asis
2798@item @emph{Description}
2799This function waits for the completion of all asynchronous operations.
2800
2801@item @emph{C/C++}:
2802@multitable @columnfractions .20 .80
2803@item @emph{Prototype}: @tab @code{acc_wait_all(void);}
2804@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
2805@end multitable
2806
2807@item @emph{Fortran}:
2808@multitable @columnfractions .20 .80
2809@item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
2810@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
2811@end multitable
2812
2813@item @emph{Reference}:
2814@uref{https://www.openacc.org, OpenACC specification v2.6}, section
28153.2.13.
2816@end table
2817
2818
2819
2820@node acc_wait_all_async
2821@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
2822@table @asis
2823@item @emph{Description}
2824This function enqueues a wait operation on the queue @var{async} for any
2825and all asynchronous operations that have been previously enqueued on
2826any queue.
2827
2828@item @emph{C/C++}:
2829@multitable @columnfractions .20 .80
2830@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
2831@end multitable
2832
2833@item @emph{Fortran}:
2834@multitable @columnfractions .20 .80
2835@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
2836@item @tab @code{integer(acc_handle_kind) async}
2837@end multitable
2838
2839@item @emph{Reference}:
2840@uref{https://www.openacc.org, OpenACC specification v2.6}, section
28413.2.14.
2842@end table
2843
2844
2845
2846@node acc_wait_async
2847@section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
2848@table @asis
2849@item @emph{Description}
2850This function enqueues a wait operation on queue @var{async} for any and all
2851asynchronous operations enqueued on queue @var{arg}.
2852
2853@item @emph{C/C++}:
2854@multitable @columnfractions .20 .80
2855@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
2856@end multitable
2857
2858@item @emph{Fortran}:
2859@multitable @columnfractions .20 .80
2860@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
2861@item @tab @code{integer(acc_handle_kind) arg, async}
2862@end multitable
2863
2864@item @emph{Reference}:
2865@uref{https://www.openacc.org, OpenACC specification v2.6}, section
28663.2.12.
2867@end table
2868
2869
2870
2871@node acc_init
2872@section @code{acc_init} -- Initialize runtime for a specific device type.
2873@table @asis
2874@item @emph{Description}
2875This function initializes the runtime for the device type specified in
2876@var{devicetype}.
2877
2878@item @emph{C/C++}:
2879@multitable @columnfractions .20 .80
2880@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
2881@end multitable
2882
2883@item @emph{Fortran}:
2884@multitable @columnfractions .20 .80
2885@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
2886@item @tab @code{integer(acc_device_kind) devicetype}
2887@end multitable
2888
2889@item @emph{Reference}:
2890@uref{https://www.openacc.org, OpenACC specification v2.6}, section
28913.2.7.
2892@end table
2893
2894
2895
2896@node acc_shutdown
2897@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
2898@table @asis
2899@item @emph{Description}
2900This function shuts down the runtime for the device type specified in
2901@var{devicetype}.
2902
2903@item @emph{C/C++}:
2904@multitable @columnfractions .20 .80
2905@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
2906@end multitable
2907
2908@item @emph{Fortran}:
2909@multitable @columnfractions .20 .80
2910@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
2911@item @tab @code{integer(acc_device_kind) devicetype}
2912@end multitable
2913
2914@item @emph{Reference}:
2915@uref{https://www.openacc.org, OpenACC specification v2.6}, section
29163.2.8.
2917@end table
2918
2919
2920
2921@node acc_on_device
2922@section @code{acc_on_device} -- Whether executing on a particular device
2923@table @asis
2924@item @emph{Description}:
2925This function returns whether the program is executing on a particular
2926device specified in @var{devicetype}. In C/C++ a non-zero value is
2927returned to indicate the device is executing on the specified device type.
2928In Fortran, @code{true} will be returned. If the program is not executing
2929on the specified device type C/C++ will return a zero, while Fortran will
2930return @code{false}.
2931
2932@item @emph{C/C++}:
2933@multitable @columnfractions .20 .80
2934@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
2935@end multitable
2936
2937@item @emph{Fortran}:
2938@multitable @columnfractions .20 .80
2939@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
2940@item @tab @code{integer(acc_device_kind) devicetype}
2941@item @tab @code{logical acc_on_device}
2942@end multitable
2943
2944
2945@item @emph{Reference}:
2946@uref{https://www.openacc.org, OpenACC specification v2.6}, section
29473.2.17.
2948@end table
2949
2950
2951
2952@node acc_malloc
2953@section @code{acc_malloc} -- Allocate device memory.
2954@table @asis
2955@item @emph{Description}
2956This function allocates @var{len} bytes of device memory. It returns
2957the device address of the allocated memory.
2958
2959@item @emph{C/C++}:
2960@multitable @columnfractions .20 .80
2961@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
2962@end multitable
2963
2964@item @emph{Reference}:
2965@uref{https://www.openacc.org, OpenACC specification v2.6}, section
29663.2.18.
2967@end table
2968
2969
2970
2971@node acc_free
2972@section @code{acc_free} -- Free device memory.
2973@table @asis
2974@item @emph{Description}
2975Free previously allocated device memory at the device address @code{a}.
2976
2977@item @emph{C/C++}:
2978@multitable @columnfractions .20 .80
2979@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
2980@end multitable
2981
2982@item @emph{Reference}:
2983@uref{https://www.openacc.org, OpenACC specification v2.6}, section
29843.2.19.
2985@end table
2986
2987
2988
2989@node acc_copyin
2990@section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
2991@table @asis
2992@item @emph{Description}
2993In C/C++, this function allocates @var{len} bytes of device memory
2994and maps it to the specified host address in @var{a}. The device
2995address of the newly allocated device memory is returned.
2996
2997In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2998a contiguous array section. The second form @var{a} specifies a
2999variable or array element and @var{len} specifies the length in bytes.
3000
3001@item @emph{C/C++}:
3002@multitable @columnfractions .20 .80
3003@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
3004@item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);}
3005@end multitable
3006
3007@item @emph{Fortran}:
3008@multitable @columnfractions .20 .80
3009@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
3010@item @tab @code{type, dimension(:[,:]...) :: a}
3011@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
3012@item @tab @code{type, dimension(:[,:]...) :: a}
3013@item @tab @code{integer len}
3014@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)}
3015@item @tab @code{type, dimension(:[,:]...) :: a}
3016@item @tab @code{integer(acc_handle_kind) :: async}
3017@item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)}
3018@item @tab @code{type, dimension(:[,:]...) :: a}
3019@item @tab @code{integer len}
3020@item @tab @code{integer(acc_handle_kind) :: async}
3021@end multitable
3022
3023@item @emph{Reference}:
3024@uref{https://www.openacc.org, OpenACC specification v2.6}, section
30253.2.20.
3026@end table
3027
3028
3029
3030@node acc_present_or_copyin
3031@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
3032@table @asis
3033@item @emph{Description}
3034This function tests if the host data specified by @var{a} and of length
3035@var{len} is present or not. If it is not present, then device memory
3036will be allocated and the host memory copied. The device address of
3037the newly allocated device memory is returned.
3038
3039In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3040a contiguous array section. The second form @var{a} specifies a variable or
3041array element and @var{len} specifies the length in bytes.
3042
3043Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for
3044backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead.
3045
3046@item @emph{C/C++}:
3047@multitable @columnfractions .20 .80
3048@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
3049@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
3050@end multitable
3051
3052@item @emph{Fortran}:
3053@multitable @columnfractions .20 .80
3054@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
3055@item @tab @code{type, dimension(:[,:]...) :: a}
3056@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
3057@item @tab @code{type, dimension(:[,:]...) :: a}
3058@item @tab @code{integer len}
3059@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
3060@item @tab @code{type, dimension(:[,:]...) :: a}
3061@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
3062@item @tab @code{type, dimension(:[,:]...) :: a}
3063@item @tab @code{integer len}
3064@end multitable
3065
3066@item @emph{Reference}:
3067@uref{https://www.openacc.org, OpenACC specification v2.6}, section
30683.2.20.
3069@end table
3070
3071
3072
3073@node acc_create
3074@section @code{acc_create} -- Allocate device memory and map it to host memory.
3075@table @asis
3076@item @emph{Description}
3077This function allocates device memory and maps it to host memory specified
3078by the host address @var{a} with a length of @var{len} bytes. In C/C++,
3079the function returns the device address of the allocated device memory.
3080
3081In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3082a contiguous array section. The second form @var{a} specifies a variable or
3083array element and @var{len} specifies the length in bytes.
3084
3085@item @emph{C/C++}:
3086@multitable @columnfractions .20 .80
3087@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
3088@item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);}
3089@end multitable
3090
3091@item @emph{Fortran}:
3092@multitable @columnfractions .20 .80
3093@item @emph{Interface}: @tab @code{subroutine acc_create(a)}
3094@item @tab @code{type, dimension(:[,:]...) :: a}
3095@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
3096@item @tab @code{type, dimension(:[,:]...) :: a}
3097@item @tab @code{integer len}
3098@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)}
3099@item @tab @code{type, dimension(:[,:]...) :: a}
3100@item @tab @code{integer(acc_handle_kind) :: async}
3101@item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)}
3102@item @tab @code{type, dimension(:[,:]...) :: a}
3103@item @tab @code{integer len}
3104@item @tab @code{integer(acc_handle_kind) :: async}
3105@end multitable
3106
3107@item @emph{Reference}:
3108@uref{https://www.openacc.org, OpenACC specification v2.6}, section
31093.2.21.
3110@end table
3111
3112
3113
3114@node acc_present_or_create
3115@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
3116@table @asis
3117@item @emph{Description}
3118This function tests if the host data specified by @var{a} and of length
3119@var{len} is present or not. If it is not present, then device memory
3120will be allocated and mapped to host memory. In C/C++, the device address
3121of the newly allocated device memory is returned.
3122
3123In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3124a contiguous array section. The second form @var{a} specifies a variable or
3125array element and @var{len} specifies the length in bytes.
3126
3127Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for
3128backward compatibility with OpenACC 2.0; use @ref{acc_create} instead.
3129
3130@item @emph{C/C++}:
3131@multitable @columnfractions .20 .80
3132@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
3133@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
3134@end multitable
3135
3136@item @emph{Fortran}:
3137@multitable @columnfractions .20 .80
3138@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
3139@item @tab @code{type, dimension(:[,:]...) :: a}
3140@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
3141@item @tab @code{type, dimension(:[,:]...) :: a}
3142@item @tab @code{integer len}
3143@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
3144@item @tab @code{type, dimension(:[,:]...) :: a}
3145@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
3146@item @tab @code{type, dimension(:[,:]...) :: a}
3147@item @tab @code{integer len}
3148@end multitable
3149
3150@item @emph{Reference}:
3151@uref{https://www.openacc.org, OpenACC specification v2.6}, section
31523.2.21.
3153@end table
3154
3155
3156
3157@node acc_copyout
3158@section @code{acc_copyout} -- Copy device memory to host memory.
3159@table @asis
3160@item @emph{Description}
3161This function copies mapped device memory to host memory which is specified
3162by host address @var{a} for a length @var{len} bytes in C/C++.
3163
3164In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3165a contiguous array section. The second form @var{a} specifies a variable or
3166array element and @var{len} specifies the length in bytes.
3167
3168@item @emph{C/C++}:
3169@multitable @columnfractions .20 .80
3170@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
3171@item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);}
3172@item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);}
3173@item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);}
3174@end multitable
3175
3176@item @emph{Fortran}:
3177@multitable @columnfractions .20 .80
3178@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
3179@item @tab @code{type, dimension(:[,:]...) :: a}
3180@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
3181@item @tab @code{type, dimension(:[,:]...) :: a}
3182@item @tab @code{integer len}
3183@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)}
3184@item @tab @code{type, dimension(:[,:]...) :: a}
3185@item @tab @code{integer(acc_handle_kind) :: async}
3186@item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)}
3187@item @tab @code{type, dimension(:[,:]...) :: a}
3188@item @tab @code{integer len}
3189@item @tab @code{integer(acc_handle_kind) :: async}
3190@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)}
3191@item @tab @code{type, dimension(:[,:]...) :: a}
3192@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)}
3193@item @tab @code{type, dimension(:[,:]...) :: a}
3194@item @tab @code{integer len}
3195@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)}
3196@item @tab @code{type, dimension(:[,:]...) :: a}
3197@item @tab @code{integer(acc_handle_kind) :: async}
3198@item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)}
3199@item @tab @code{type, dimension(:[,:]...) :: a}
3200@item @tab @code{integer len}
3201@item @tab @code{integer(acc_handle_kind) :: async}
3202@end multitable
3203
3204@item @emph{Reference}:
3205@uref{https://www.openacc.org, OpenACC specification v2.6}, section
32063.2.22.
3207@end table
3208
3209
3210
3211@node acc_delete
3212@section @code{acc_delete} -- Free device memory.
3213@table @asis
3214@item @emph{Description}
3215This function frees previously allocated device memory specified by
3216the device address @var{a} and the length of @var{len} bytes.
3217
3218In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3219a contiguous array section. The second form @var{a} specifies a variable or
3220array element and @var{len} specifies the length in bytes.
3221
3222@item @emph{C/C++}:
3223@multitable @columnfractions .20 .80
3224@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
3225@item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);}
3226@item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);}
3227@item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);}
3228@end multitable
3229
3230@item @emph{Fortran}:
3231@multitable @columnfractions .20 .80
3232@item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
3233@item @tab @code{type, dimension(:[,:]...) :: a}
3234@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
3235@item @tab @code{type, dimension(:[,:]...) :: a}
3236@item @tab @code{integer len}
3237@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)}
3238@item @tab @code{type, dimension(:[,:]...) :: a}
3239@item @tab @code{integer(acc_handle_kind) :: async}
3240@item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)}
3241@item @tab @code{type, dimension(:[,:]...) :: a}
3242@item @tab @code{integer len}
3243@item @tab @code{integer(acc_handle_kind) :: async}
3244@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)}
3245@item @tab @code{type, dimension(:[,:]...) :: a}
3246@item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)}
3247@item @tab @code{type, dimension(:[,:]...) :: a}
3248@item @tab @code{integer len}
3249@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)}
3250@item @tab @code{type, dimension(:[,:]...) :: a}
3251@item @tab @code{integer(acc_handle_kind) :: async}
3252@item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)}
3253@item @tab @code{type, dimension(:[,:]...) :: a}
3254@item @tab @code{integer len}
3255@item @tab @code{integer(acc_handle_kind) :: async}
3256@end multitable
3257
3258@item @emph{Reference}:
3259@uref{https://www.openacc.org, OpenACC specification v2.6}, section
32603.2.23.
3261@end table
3262
3263
3264
3265@node acc_update_device
3266@section @code{acc_update_device} -- Update device memory from mapped host memory.
3267@table @asis
3268@item @emph{Description}
3269This function updates the device copy from the previously mapped host memory.
3270The host memory is specified with the host address @var{a} and a length of
3271@var{len} bytes.
3272
3273In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3274a contiguous array section. The second form @var{a} specifies a variable or
3275array element and @var{len} specifies the length in bytes.
3276
3277@item @emph{C/C++}:
3278@multitable @columnfractions .20 .80
3279@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
3280@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);}
3281@end multitable
3282
3283@item @emph{Fortran}:
3284@multitable @columnfractions .20 .80
3285@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
3286@item @tab @code{type, dimension(:[,:]...) :: a}
3287@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
3288@item @tab @code{type, dimension(:[,:]...) :: a}
3289@item @tab @code{integer len}
3290@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)}
3291@item @tab @code{type, dimension(:[,:]...) :: a}
3292@item @tab @code{integer(acc_handle_kind) :: async}
3293@item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)}
3294@item @tab @code{type, dimension(:[,:]...) :: a}
3295@item @tab @code{integer len}
3296@item @tab @code{integer(acc_handle_kind) :: async}
3297@end multitable
3298
3299@item @emph{Reference}:
3300@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33013.2.24.
3302@end table
3303
3304
3305
3306@node acc_update_self
3307@section @code{acc_update_self} -- Update host memory from mapped device memory.
3308@table @asis
3309@item @emph{Description}
3310This function updates the host copy from the previously mapped device memory.
3311The host memory is specified with the host address @var{a} and a length of
3312@var{len} bytes.
3313
3314In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3315a contiguous array section. The second form @var{a} specifies a variable or
3316array element and @var{len} specifies the length in bytes.
3317
3318@item @emph{C/C++}:
3319@multitable @columnfractions .20 .80
3320@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
3321@item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);}
3322@end multitable
3323
3324@item @emph{Fortran}:
3325@multitable @columnfractions .20 .80
3326@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
3327@item @tab @code{type, dimension(:[,:]...) :: a}
3328@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
3329@item @tab @code{type, dimension(:[,:]...) :: a}
3330@item @tab @code{integer len}
3331@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)}
3332@item @tab @code{type, dimension(:[,:]...) :: a}
3333@item @tab @code{integer(acc_handle_kind) :: async}
3334@item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)}
3335@item @tab @code{type, dimension(:[,:]...) :: a}
3336@item @tab @code{integer len}
3337@item @tab @code{integer(acc_handle_kind) :: async}
3338@end multitable
3339
3340@item @emph{Reference}:
3341@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33423.2.25.
3343@end table
3344
3345
3346
3347@node acc_map_data
3348@section @code{acc_map_data} -- Map previously allocated device memory to host memory.
3349@table @asis
3350@item @emph{Description}
3351This function maps previously allocated device and host memory. The device
3352memory is specified with the device address @var{d}. The host memory is
3353specified with the host address @var{h} and a length of @var{len}.
3354
3355@item @emph{C/C++}:
3356@multitable @columnfractions .20 .80
3357@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
3358@end multitable
3359
3360@item @emph{Reference}:
3361@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33623.2.26.
3363@end table
3364
3365
3366
3367@node acc_unmap_data
3368@section @code{acc_unmap_data} -- Unmap device memory from host memory.
3369@table @asis
3370@item @emph{Description}
3371This function unmaps previously mapped device and host memory. The latter
3372specified by @var{h}.
3373
3374@item @emph{C/C++}:
3375@multitable @columnfractions .20 .80
3376@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
3377@end multitable
3378
3379@item @emph{Reference}:
3380@uref{https://www.openacc.org, OpenACC specification v2.6}, section
33813.2.27.
3382@end table
3383
3384
3385
3386@node acc_deviceptr
3387@section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
3388@table @asis
3389@item @emph{Description}
3390This function returns the device address that has been mapped to the
3391host address specified by @var{h}.
3392
3393@item @emph{C/C++}:
3394@multitable @columnfractions .20 .80
3395@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
3396@end multitable
3397
3398@item @emph{Reference}:
3399@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34003.2.28.
3401@end table
3402
3403
3404
3405@node acc_hostptr
3406@section @code{acc_hostptr} -- Get host pointer associated with specific device address.
3407@table @asis
3408@item @emph{Description}
3409This function returns the host address that has been mapped to the
3410device address specified by @var{d}.
3411
3412@item @emph{C/C++}:
3413@multitable @columnfractions .20 .80
3414@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
3415@end multitable
3416
3417@item @emph{Reference}:
3418@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34193.2.29.
3420@end table
3421
3422
3423
3424@node acc_is_present
3425@section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
3426@table @asis
3427@item @emph{Description}
3428This function indicates whether the specified host address in @var{a} and a
3429length of @var{len} bytes is present on the device. In C/C++, a non-zero
3430value is returned to indicate the presence of the mapped memory on the
3431device. A zero is returned to indicate the memory is not mapped on the
3432device.
3433
3434In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
3435a contiguous array section. The second form @var{a} specifies a variable or
3436array element and @var{len} specifies the length in bytes. If the host
3437memory is mapped to device memory, then a @code{true} is returned. Otherwise,
3438a @code{false} is return to indicate the mapped memory is not present.
3439
3440@item @emph{C/C++}:
3441@multitable @columnfractions .20 .80
3442@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
3443@end multitable
3444
3445@item @emph{Fortran}:
3446@multitable @columnfractions .20 .80
3447@item @emph{Interface}: @tab @code{function acc_is_present(a)}
3448@item @tab @code{type, dimension(:[,:]...) :: a}
3449@item @tab @code{logical acc_is_present}
3450@item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
3451@item @tab @code{type, dimension(:[,:]...) :: a}
3452@item @tab @code{integer len}
3453@item @tab @code{logical acc_is_present}
3454@end multitable
3455
3456@item @emph{Reference}:
3457@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34583.2.30.
3459@end table
3460
3461
3462
3463@node acc_memcpy_to_device
3464@section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
3465@table @asis
3466@item @emph{Description}
3467This function copies host memory specified by host address of @var{src} to
3468device memory specified by the device address @var{dest} for a length of
3469@var{bytes} bytes.
3470
3471@item @emph{C/C++}:
3472@multitable @columnfractions .20 .80
3473@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
3474@end multitable
3475
3476@item @emph{Reference}:
3477@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34783.2.31.
3479@end table
3480
3481
3482
3483@node acc_memcpy_from_device
3484@section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
3485@table @asis
3486@item @emph{Description}
3487This function copies host memory specified by host address of @var{src} from
3488device memory specified by the device address @var{dest} for a length of
3489@var{bytes} bytes.
3490
3491@item @emph{C/C++}:
3492@multitable @columnfractions .20 .80
3493@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
3494@end multitable
3495
3496@item @emph{Reference}:
3497@uref{https://www.openacc.org, OpenACC specification v2.6}, section
34983.2.32.
3499@end table
3500
3501
3502
3503@node acc_attach
3504@section @code{acc_attach} -- Let device pointer point to device-pointer target.
3505@table @asis
3506@item @emph{Description}
3507This function updates a pointer on the device from pointing to a host-pointer
3508address to pointing to the corresponding device data.
3509
3510@item @emph{C/C++}:
3511@multitable @columnfractions .20 .80
3512@item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);}
3513@item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);}
3514@end multitable
3515
3516@item @emph{Reference}:
3517@uref{https://www.openacc.org, OpenACC specification v2.6}, section
35183.2.34.
3519@end table
3520
3521
3522
3523@node acc_detach
3524@section @code{acc_detach} -- Let device pointer point to host-pointer target.
3525@table @asis
3526@item @emph{Description}
3527This function updates a pointer on the device from pointing to a device-pointer
3528address to pointing to the corresponding host data.
3529
3530@item @emph{C/C++}:
3531@multitable @columnfractions .20 .80
3532@item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);}
3533@item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);}
3534@item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);}
3535@item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);}
3536@end multitable
3537
3538@item @emph{Reference}:
3539@uref{https://www.openacc.org, OpenACC specification v2.6}, section
35403.2.35.
3541@end table
3542
3543
3544
3545@node acc_get_current_cuda_device
3546@section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
3547@table @asis
3548@item @emph{Description}
3549This function returns the CUDA device handle. This handle is the same
3550as used by the CUDA Runtime or Driver API's.
3551
3552@item @emph{C/C++}:
3553@multitable @columnfractions .20 .80
3554@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
3555@end multitable
3556
3557@item @emph{Reference}:
3558@uref{https://www.openacc.org, OpenACC specification v2.6}, section
3559A.2.1.1.
3560@end table
3561
3562
3563
3564@node acc_get_current_cuda_context
3565@section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
3566@table @asis
3567@item @emph{Description}
3568This function returns the CUDA context handle. This handle is the same
3569as used by the CUDA Runtime or Driver API's.
3570
3571@item @emph{C/C++}:
3572@multitable @columnfractions .20 .80
3573@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
3574@end multitable
3575
3576@item @emph{Reference}:
3577@uref{https://www.openacc.org, OpenACC specification v2.6}, section
3578A.2.1.2.
3579@end table
3580
3581
3582
3583@node acc_get_cuda_stream
3584@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
3585@table @asis
3586@item @emph{Description}
3587This function returns the CUDA stream handle for the queue @var{async}.
3588This handle is the same as used by the CUDA Runtime or Driver API's.
3589
3590@item @emph{C/C++}:
3591@multitable @columnfractions .20 .80
3592@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
3593@end multitable
3594
3595@item @emph{Reference}:
3596@uref{https://www.openacc.org, OpenACC specification v2.6}, section
3597A.2.1.3.
3598@end table
3599
3600
3601
3602@node acc_set_cuda_stream
3603@section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
3604@table @asis
3605@item @emph{Description}
3606This function associates the stream handle specified by @var{stream} with
3607the queue @var{async}.
3608
3609This cannot be used to change the stream handle associated with
3610@code{acc_async_sync}.
3611
3612The return value is not specified.
3613
3614@item @emph{C/C++}:
3615@multitable @columnfractions .20 .80
3616@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
3617@end multitable
3618
3619@item @emph{Reference}:
3620@uref{https://www.openacc.org, OpenACC specification v2.6}, section
3621A.2.1.4.
3622@end table
3623
3624
3625
3626@node acc_prof_register
3627@section @code{acc_prof_register} -- Register callbacks.
3628@table @asis
3629@item @emph{Description}:
3630This function registers callbacks.
3631
3632@item @emph{C/C++}:
3633@multitable @columnfractions .20 .80
3634@item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
3635@end multitable
3636
3637@item @emph{See also}:
3638@ref{OpenACC Profiling Interface}
3639
3640@item @emph{Reference}:
3641@uref{https://www.openacc.org, OpenACC specification v2.6}, section
36425.3.
3643@end table
3644
3645
3646
3647@node acc_prof_unregister
3648@section @code{acc_prof_unregister} -- Unregister callbacks.
3649@table @asis
3650@item @emph{Description}:
3651This function unregisters callbacks.
3652
3653@item @emph{C/C++}:
3654@multitable @columnfractions .20 .80
3655@item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
3656@end multitable
3657
3658@item @emph{See also}:
3659@ref{OpenACC Profiling Interface}
3660
3661@item @emph{Reference}:
3662@uref{https://www.openacc.org, OpenACC specification v2.6}, section
36635.3.
3664@end table
3665
3666
3667
3668@node acc_prof_lookup
3669@section @code{acc_prof_lookup} -- Obtain inquiry functions.
3670@table @asis
3671@item @emph{Description}:
3672Function to obtain inquiry functions.
3673
3674@item @emph{C/C++}:
3675@multitable @columnfractions .20 .80
3676@item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
3677@end multitable
3678
3679@item @emph{See also}:
3680@ref{OpenACC Profiling Interface}
3681
3682@item @emph{Reference}:
3683@uref{https://www.openacc.org, OpenACC specification v2.6}, section
36845.3.
3685@end table
3686
3687
3688
3689@node acc_register_library
3690@section @code{acc_register_library} -- Library registration.
3691@table @asis
3692@item @emph{Description}:
3693Function for library registration.
3694
3695@item @emph{C/C++}:
3696@multitable @columnfractions .20 .80
3697@item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
3698@end multitable
3699
3700@item @emph{See also}:
3701@ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
3702
3703@item @emph{Reference}:
3704@uref{https://www.openacc.org, OpenACC specification v2.6}, section
37055.3.
3706@end table
3707
3708
3709
3710@c ---------------------------------------------------------------------
3711@c OpenACC Environment Variables
3712@c ---------------------------------------------------------------------
3713
3714@node OpenACC Environment Variables
3715@chapter OpenACC Environment Variables
3716
3717The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
3718are defined by section 4 of the OpenACC specification in version 2.0.
3719The variable @env{ACC_PROFLIB}
3720is defined by section 4 of the OpenACC specification in version 2.6.
3721The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
3722
3723@menu
3724* ACC_DEVICE_TYPE::
3725* ACC_DEVICE_NUM::
3726* ACC_PROFLIB::
3727* GCC_ACC_NOTIFY::
3728@end menu
3729
3730
3731
3732@node ACC_DEVICE_TYPE
3733@section @code{ACC_DEVICE_TYPE}
3734@table @asis
3735@item @emph{Reference}:
3736@uref{https://www.openacc.org, OpenACC specification v2.6}, section
37374.1.
3738@end table
3739
3740
3741
3742@node ACC_DEVICE_NUM
3743@section @code{ACC_DEVICE_NUM}
3744@table @asis
3745@item @emph{Reference}:
3746@uref{https://www.openacc.org, OpenACC specification v2.6}, section
37474.2.
3748@end table
3749
3750
3751
3752@node ACC_PROFLIB
3753@section @code{ACC_PROFLIB}
3754@table @asis
3755@item @emph{See also}:
3756@ref{acc_register_library}, @ref{OpenACC Profiling Interface}
3757
3758@item @emph{Reference}:
3759@uref{https://www.openacc.org, OpenACC specification v2.6}, section
37604.3.
3761@end table
3762
3763
3764
3765@node GCC_ACC_NOTIFY
3766@section @code{GCC_ACC_NOTIFY}
3767@table @asis
3768@item @emph{Description}:
3769Print debug information pertaining to the accelerator.
3770@end table
3771
3772
3773
3774@c ---------------------------------------------------------------------
3775@c CUDA Streams Usage
3776@c ---------------------------------------------------------------------
3777
3778@node CUDA Streams Usage
3779@chapter CUDA Streams Usage
3780
3781This applies to the @code{nvptx} plugin only.
3782
3783The library provides elements that perform asynchronous movement of
3784data and asynchronous operation of computing constructs. This
3785asynchronous functionality is implemented by making use of CUDA
3786streams@footnote{See "Stream Management" in "CUDA Driver API",
3787TRM-06703-001, Version 5.5, for additional information}.
3788
3789The primary means by that the asynchronous functionality is accessed
3790is through the use of those OpenACC directives which make use of the
3791@code{async} and @code{wait} clauses. When the @code{async} clause is
3792first used with a directive, it creates a CUDA stream. If an
3793@code{async-argument} is used with the @code{async} clause, then the
3794stream is associated with the specified @code{async-argument}.
3795
3796Following the creation of an association between a CUDA stream and the
3797@code{async-argument} of an @code{async} clause, both the @code{wait}
3798clause and the @code{wait} directive can be used. When either the
3799clause or directive is used after stream creation, it creates a
3800rendezvous point whereby execution waits until all operations
3801associated with the @code{async-argument}, that is, stream, have
3802completed.
3803
3804Normally, the management of the streams that are created as a result of
3805using the @code{async} clause, is done without any intervention by the
3806caller. This implies the association between the @code{async-argument}
3807and the CUDA stream will be maintained for the lifetime of the program.
3808However, this association can be changed through the use of the library
3809function @code{acc_set_cuda_stream}. When the function
3810@code{acc_set_cuda_stream} is called, the CUDA stream that was
3811originally associated with the @code{async} clause will be destroyed.
3812Caution should be taken when changing the association as subsequent
3813references to the @code{async-argument} refer to a different
3814CUDA stream.
3815
3816
3817
3818@c ---------------------------------------------------------------------
3819@c OpenACC Library Interoperability
3820@c ---------------------------------------------------------------------
3821
3822@node OpenACC Library Interoperability
3823@chapter OpenACC Library Interoperability
3824
3825@section Introduction
3826
3827The OpenACC library uses the CUDA Driver API, and may interact with
3828programs that use the Runtime library directly, or another library
3829based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
3830"Interactions with the CUDA Driver API" in
3831"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
3832Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
3833for additional information on library interoperability.}.
3834This chapter describes the use cases and what changes are
3835required in order to use both the OpenACC library and the CUBLAS and Runtime
3836libraries within a program.
3837
3838@section First invocation: NVIDIA CUBLAS library API
3839
3840In this first use case (see below), a function in the CUBLAS library is called
3841prior to any of the functions in the OpenACC library. More specifically, the
3842function @code{cublasCreate()}.
3843
3844When invoked, the function initializes the library and allocates the
3845hardware resources on the host and the device on behalf of the caller. Once
3846the initialization and allocation has completed, a handle is returned to the
3847caller. The OpenACC library also requires initialization and allocation of
3848hardware resources. Since the CUBLAS library has already allocated the
3849hardware resources for the device, all that is left to do is to initialize
3850the OpenACC library and acquire the hardware resources on the host.
3851
3852Prior to calling the OpenACC function that initializes the library and
3853allocate the host hardware resources, you need to acquire the device number
3854that was allocated during the call to @code{cublasCreate()}. The invoking of the
3855runtime library function @code{cudaGetDevice()} accomplishes this. Once
3856acquired, the device number is passed along with the device type as
3857parameters to the OpenACC library function @code{acc_set_device_num()}.
3858
3859Once the call to @code{acc_set_device_num()} has completed, the OpenACC
3860library uses the context that was created during the call to
3861@code{cublasCreate()}. In other words, both libraries will be sharing the
3862same context.
3863
3864@smallexample
3865 /* Create the handle */
3866 s = cublasCreate(&h);
3867 if (s != CUBLAS_STATUS_SUCCESS)
3868 @{
3869 fprintf(stderr, "cublasCreate failed %d\n", s);
3870 exit(EXIT_FAILURE);
3871 @}
3872
3873 /* Get the device number */
3874 e = cudaGetDevice(&dev);
3875 if (e != cudaSuccess)
3876 @{
3877 fprintf(stderr, "cudaGetDevice failed %d\n", e);
3878 exit(EXIT_FAILURE);
3879 @}
3880
3881 /* Initialize OpenACC library and use device 'dev' */
3882 acc_set_device_num(dev, acc_device_nvidia);
3883
3884@end smallexample
3885@center Use Case 1
3886
3887@section First invocation: OpenACC library API
3888
3889In this second use case (see below), a function in the OpenACC library is
3890called prior to any of the functions in the CUBLAS library. More specificially,
3891the function @code{acc_set_device_num()}.
3892
3893In the use case presented here, the function @code{acc_set_device_num()}
3894is used to both initialize the OpenACC library and allocate the hardware
3895resources on the host and the device. In the call to the function, the
3896call parameters specify which device to use and what device
3897type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
3898is but one method to initialize the OpenACC library and allocate the
3899appropriate hardware resources. Other methods are available through the
3900use of environment variables and these will be discussed in the next section.
3901
3902Once the call to @code{acc_set_device_num()} has completed, other OpenACC
3903functions can be called as seen with multiple calls being made to
3904@code{acc_copyin()}. In addition, calls can be made to functions in the
3905CUBLAS library. In the use case a call to @code{cublasCreate()} is made
3906subsequent to the calls to @code{acc_copyin()}.
3907As seen in the previous use case, a call to @code{cublasCreate()}
3908initializes the CUBLAS library and allocates the hardware resources on the
3909host and the device. However, since the device has already been allocated,
3910@code{cublasCreate()} will only initialize the CUBLAS library and allocate
3911the appropriate hardware resources on the host. The context that was created
3912as part of the OpenACC initialization is shared with the CUBLAS library,
3913similarly to the first use case.
3914
3915@smallexample
3916 dev = 0;
3917
3918 acc_set_device_num(dev, acc_device_nvidia);
3919
3920 /* Copy the first set to the device */
3921 d_X = acc_copyin(&h_X[0], N * sizeof (float));
3922 if (d_X == NULL)
3923 @{
3924 fprintf(stderr, "copyin error h_X\n");
3925 exit(EXIT_FAILURE);
3926 @}
3927
3928 /* Copy the second set to the device */
3929 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
3930 if (d_Y == NULL)
3931 @{
3932 fprintf(stderr, "copyin error h_Y1\n");
3933 exit(EXIT_FAILURE);
3934 @}
3935
3936 /* Create the handle */
3937 s = cublasCreate(&h);
3938 if (s != CUBLAS_STATUS_SUCCESS)
3939 @{
3940 fprintf(stderr, "cublasCreate failed %d\n", s);
3941 exit(EXIT_FAILURE);
3942 @}
3943
3944 /* Perform saxpy using CUBLAS library function */
3945 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
3946 if (s != CUBLAS_STATUS_SUCCESS)
3947 @{
3948 fprintf(stderr, "cublasSaxpy failed %d\n", s);
3949 exit(EXIT_FAILURE);
3950 @}
3951
3952 /* Copy the results from the device */
3953 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
3954
3955@end smallexample
3956@center Use Case 2
3957
3958@section OpenACC library and environment variables
3959
3960There are two environment variables associated with the OpenACC library
3961that may be used to control the device type and device number:
3962@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two
3963environment variables can be used as an alternative to calling
3964@code{acc_set_device_num()}. As seen in the second use case, the device
3965type and device number were specified using @code{acc_set_device_num()}.
3966If however, the aforementioned environment variables were set, then the
3967call to @code{acc_set_device_num()} would not be required.
3968
3969
3970The use of the environment variables is only relevant when an OpenACC function
3971is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
3972is called prior to a call to an OpenACC function, then you must call
3973@code{acc_set_device_num()}@footnote{More complete information
3974about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
3975sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
3976Application Programming Interface”, Version 2.6.}
3977
3978
3979
3980@c ---------------------------------------------------------------------
3981@c OpenACC Profiling Interface
3982@c ---------------------------------------------------------------------
3983
3984@node OpenACC Profiling Interface
3985@chapter OpenACC Profiling Interface
3986
3987@section Implementation Status and Implementation-Defined Behavior
3988
3989We're implementing the OpenACC Profiling Interface as defined by the
3990OpenACC 2.6 specification. We're clarifying some aspects here as
3991@emph{implementation-defined behavior}, while they're still under
3992discussion within the OpenACC Technical Committee.
3993
3994This implementation is tuned to keep the performance impact as low as
3995possible for the (very common) case that the Profiling Interface is
3996not enabled. This is relevant, as the Profiling Interface affects all
3997the @emph{hot} code paths (in the target code, not in the offloaded
3998code). Users of the OpenACC Profiling Interface can be expected to
3999understand that performance will be impacted to some degree once the
4000Profiling Interface has gotten enabled: for example, because of the
4001@emph{runtime} (libgomp) calling into a third-party @emph{library} for
4002every event that has been registered.
4003
4004We're not yet accounting for the fact that @cite{OpenACC events may
4005occur during event processing}.
4006We just handle one case specially, as required by CUDA 9.0
4007@command{nvprof}, that @code{acc_get_device_type}
4008(@ref{acc_get_device_type})) may be called from
4009@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
4010callbacks.
4011
4012We're not yet implementing initialization via a
4013@code{acc_register_library} function that is either statically linked
4014in, or dynamically via @env{LD_PRELOAD}.
4015Initialization via @code{acc_register_library} functions dynamically
4016loaded via the @env{ACC_PROFLIB} environment variable does work, as
4017does directly calling @code{acc_prof_register},
4018@code{acc_prof_unregister}, @code{acc_prof_lookup}.
4019
4020As currently there are no inquiry functions defined, calls to
4021@code{acc_prof_lookup} will always return @code{NULL}.
4022
4023There aren't separate @emph{start}, @emph{stop} events defined for the
4024event types @code{acc_ev_create}, @code{acc_ev_delete},
4025@code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these
4026should be triggered before or after the actual device-specific call is
4027made. We trigger them after.
4028
4029Remarks about data provided to callbacks:
4030
4031@table @asis
4032
4033@item @code{acc_prof_info.event_type}
4034It's not clear if for @emph{nested} event callbacks (for example,
4035@code{acc_ev_enqueue_launch_start} as part of a parent compute
4036construct), this should be set for the nested event
4037(@code{acc_ev_enqueue_launch_start}), or if the value of the parent
4038construct should remain (@code{acc_ev_compute_construct_start}). In
4039this implementation, the value will generally correspond to the
4040innermost nested event type.
4041
4042@item @code{acc_prof_info.device_type}
4043@itemize
4044
4045@item
4046For @code{acc_ev_compute_construct_start}, and in presence of an
4047@code{if} clause with @emph{false} argument, this will still refer to
4048the offloading device type.
4049It's not clear if that's the expected behavior.
4050
4051@item
4052Complementary to the item before, for
4053@code{acc_ev_compute_construct_end}, this is set to
4054@code{acc_device_host} in presence of an @code{if} clause with
4055@emph{false} argument.
4056It's not clear if that's the expected behavior.
4057
4058@end itemize
4059
4060@item @code{acc_prof_info.thread_id}
4061Always @code{-1}; not yet implemented.
4062
4063@item @code{acc_prof_info.async}
4064@itemize
4065
4066@item
4067Not yet implemented correctly for
4068@code{acc_ev_compute_construct_start}.
4069
4070@item
4071In a compute construct, for host-fallback
4072execution/@code{acc_device_host} it will always be
4073@code{acc_async_sync}.
4074It's not clear if that's the expected behavior.
4075
4076@item
4077For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
4078it will always be @code{acc_async_sync}.
4079It's not clear if that's the expected behavior.
4080
4081@end itemize
4082
4083@item @code{acc_prof_info.async_queue}
4084There is no @cite{limited number of asynchronous queues} in libgomp.
4085This will always have the same value as @code{acc_prof_info.async}.
4086
4087@item @code{acc_prof_info.src_file}
4088Always @code{NULL}; not yet implemented.
4089
4090@item @code{acc_prof_info.func_name}
4091Always @code{NULL}; not yet implemented.
4092
4093@item @code{acc_prof_info.line_no}
4094Always @code{-1}; not yet implemented.
4095
4096@item @code{acc_prof_info.end_line_no}
4097Always @code{-1}; not yet implemented.
4098
4099@item @code{acc_prof_info.func_line_no}
4100Always @code{-1}; not yet implemented.
4101
4102@item @code{acc_prof_info.func_end_line_no}
4103Always @code{-1}; not yet implemented.
4104
4105@item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
4106Relating to @code{acc_prof_info.event_type} discussed above, in this
4107implementation, this will always be the same value as
4108@code{acc_prof_info.event_type}.
4109
4110@item @code{acc_event_info.*.parent_construct}
4111@itemize
4112
4113@item
4114Will be @code{acc_construct_parallel} for all OpenACC compute
4115constructs as well as many OpenACC Runtime API calls; should be the
4116one matching the actual construct, or
4117@code{acc_construct_runtime_api}, respectively.
4118
4119@item
4120Will be @code{acc_construct_enter_data} or
4121@code{acc_construct_exit_data} when processing variable mappings
4122specified in OpenACC @emph{declare} directives; should be
4123@code{acc_construct_declare}.
4124
4125@item
4126For implicit @code{acc_ev_device_init_start},
4127@code{acc_ev_device_init_end}, and explicit as well as implicit
4128@code{acc_ev_alloc}, @code{acc_ev_free},
4129@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
4130@code{acc_ev_enqueue_download_start}, and
4131@code{acc_ev_enqueue_download_end}, will be
4132@code{acc_construct_parallel}; should reflect the real parent
4133construct.
4134
4135@end itemize
4136
4137@item @code{acc_event_info.*.implicit}
4138For @code{acc_ev_alloc}, @code{acc_ev_free},
4139@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
4140@code{acc_ev_enqueue_download_start}, and
4141@code{acc_ev_enqueue_download_end}, this currently will be @code{1}
4142also for explicit usage.
4143
4144@item @code{acc_event_info.data_event.var_name}
4145Always @code{NULL}; not yet implemented.
4146
4147@item @code{acc_event_info.data_event.host_ptr}
4148For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
4149@code{NULL}.
4150
4151@item @code{typedef union acc_api_info}
4152@dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
4153Information}. This should obviously be @code{typedef @emph{struct}
4154acc_api_info}.
4155
4156@item @code{acc_api_info.device_api}
4157Possibly not yet implemented correctly for
4158@code{acc_ev_compute_construct_start},
4159@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
4160will always be @code{acc_device_api_none} for these event types.
4161For @code{acc_ev_enter_data_start}, it will be
4162@code{acc_device_api_none} in some cases.
4163
4164@item @code{acc_api_info.device_type}
4165Always the same as @code{acc_prof_info.device_type}.
4166
4167@item @code{acc_api_info.vendor}
4168Always @code{-1}; not yet implemented.
4169
4170@item @code{acc_api_info.device_handle}
4171Always @code{NULL}; not yet implemented.
4172
4173@item @code{acc_api_info.context_handle}
4174Always @code{NULL}; not yet implemented.
4175
4176@item @code{acc_api_info.async_handle}
4177Always @code{NULL}; not yet implemented.
4178
4179@end table
4180
4181Remarks about certain event types:
4182
4183@table @asis
4184
4185@item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
4186@itemize
4187
4188@item
4189@c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
4190@c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
4191@c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
4192When a compute construct triggers implicit
4193@code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
4194events, they currently aren't @emph{nested within} the corresponding
4195@code{acc_ev_compute_construct_start} and
4196@code{acc_ev_compute_construct_end}, but they're currently observed
4197@emph{before} @code{acc_ev_compute_construct_start}.
4198It's not clear what to do: the standard asks us provide a lot of
4199details to the @code{acc_ev_compute_construct_start} callback, without
4200(implicitly) initializing a device before?
4201
4202@item
4203Callbacks for these event types will not be invoked for calls to the
4204@code{acc_set_device_type} and @code{acc_set_device_num} functions.
4205It's not clear if they should be.
4206
4207@end itemize
4208
4209@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
4210@itemize
4211
4212@item
4213Callbacks for these event types will also be invoked for OpenACC
4214@emph{host_data} constructs.
4215It's not clear if they should be.
4216
4217@item
4218Callbacks for these event types will also be invoked when processing
4219variable mappings specified in OpenACC @emph{declare} directives.
4220It's not clear if they should be.
4221
4222@end itemize
4223
4224@end table
4225
4226Callbacks for the following event types will be invoked, but dispatch
4227and information provided therein has not yet been thoroughly reviewed:
4228
4229@itemize
4230@item @code{acc_ev_alloc}
4231@item @code{acc_ev_free}
4232@item @code{acc_ev_update_start}, @code{acc_ev_update_end}
4233@item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
4234@item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
4235@end itemize
4236
4237During device initialization, and finalization, respectively,
4238callbacks for the following event types will not yet be invoked:
4239
4240@itemize
4241@item @code{acc_ev_alloc}
4242@item @code{acc_ev_free}
4243@end itemize
4244
4245Callbacks for the following event types have not yet been implemented,
4246so currently won't be invoked:
4247
4248@itemize
4249@item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
4250@item @code{acc_ev_runtime_shutdown}
4251@item @code{acc_ev_create}, @code{acc_ev_delete}
4252@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
4253@end itemize
4254
4255For the following runtime library functions, not all expected
4256callbacks will be invoked (mostly concerning implicit device
4257initialization):
4258
4259@itemize
4260@item @code{acc_get_num_devices}
4261@item @code{acc_set_device_type}
4262@item @code{acc_get_device_type}
4263@item @code{acc_set_device_num}
4264@item @code{acc_get_device_num}
4265@item @code{acc_init}
4266@item @code{acc_shutdown}
4267@end itemize
4268
4269Aside from implicit device initialization, for the following runtime
4270library functions, no callbacks will be invoked for shared-memory
4271offloading devices (it's not clear if they should be):
4272
4273@itemize
4274@item @code{acc_malloc}
4275@item @code{acc_free}
4276@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
4277@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
4278@item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
4279@item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
4280@item @code{acc_update_device}, @code{acc_update_device_async}
4281@item @code{acc_update_self}, @code{acc_update_self_async}
4282@item @code{acc_map_data}, @code{acc_unmap_data}
4283@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
4284@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
4285@end itemize
4286
4287@c ---------------------------------------------------------------------
4288@c OpenMP-Implementation Specifics
4289@c ---------------------------------------------------------------------
4290
4291@node OpenMP-Implementation Specifics
4292@chapter OpenMP-Implementation Specifics
4293
4294@menu
4295* OpenMP Context Selectors::
4296* Memory allocation with libmemkind::
4297@end menu
4298
4299@node OpenMP Context Selectors
4300@section OpenMP Context Selectors
4301
4302@code{vendor} is always @code{gnu}. References are to the GCC manual.
4303
4304@multitable @columnfractions .60 .10 .25
4305@headitem @code{arch} @tab @code{kind} @tab @code{isa}
4306@item @code{x86}, @code{x86_64}, @code{i386}, @code{i486},
4307 @code{i586}, @code{i686}, @code{ia32}
4308 @tab @code{host}
4309 @tab See @code{-m...} flags in ``x86 Options'' (without @code{-m})
4310@item @code{amdgcn}, @code{gcn}
4311 @tab @code{gpu}
4312 @tab See @code{-march=} in ``AMD GCN Options''
4313@item @code{nvptx}
4314 @tab @code{gpu}
4315 @tab See @code{-march=} in ``Nvidia PTX Options''
4316@end multitable
4317
4318@node Memory allocation with libmemkind
4319@section Memory allocation with libmemkind
4320
4321On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
4322library} (@code{libmemkind.so.0}) is available at runtime, it is used when
4323creating memory allocators requesting
4324
4325@itemize
4326@item the memory space @code{omp_high_bw_mem_space}
4327@item the memory space @code{omp_large_cap_mem_space}
4328@item the partition trait @code{omp_atv_interleaved}
4329@end itemize
4330
4331
4332@c ---------------------------------------------------------------------
4333@c Offload-Target Specifics
4334@c ---------------------------------------------------------------------
4335
4336@node Offload-Target Specifics
4337@chapter Offload-Target Specifics
4338
4339The following sections present notes on the offload-target specifics
4340
4341@menu
4342* AMD Radeon::
4343* nvptx::
4344@end menu
4345
4346@node AMD Radeon
4347@section AMD Radeon (GCN)
4348
4349On the hardware side, there is the hierarchy (fine to coarse):
4350@itemize
4351@item work item (thread)
4352@item wavefront
4353@item work group
4354@item compute unite (CU)
4355@end itemize
4356
4357All OpenMP and OpenACC levels are used, i.e.
4358@itemize
4359@item OpenMP's simd and OpenACC's vector map to work items (thread)
4360@item OpenMP's threads (``parallel'') and OpenACC's workers map
4361 to wavefronts
4362@item OpenMP's teams and OpenACC's gang use a threadpool with the
4363 size of the number of teams or gangs, respectively.
4364@end itemize
4365
4366The used sizes are
4367@itemize
4368@item Number of teams is the specified @code{num_teams} (OpenMP) or
4369 @code{num_gangs} (OpenACC) or otherwise the number of CU
4370@item Number of wavefronts is 4 for gfx900 and 16 otherwise;
4371 @code{num_threads} (OpenMP) and @code{num_workers} (OpenACC)
4372 overrides this if smaller.
4373@item The wavefront has 102 scalars and 64 vectors
4374@item Number of workitems is always 64
4375@item The hardware permits maximally 40 workgroups/CU and
4376 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU.
4377@item 80 scalars registers and 24 vector registers in non-kernel functions
4378 (the chosen procedure-calling API).
4379@item For the kernel itself: as many as register pressure demands (number of
4380 teams and number of threads, scaled down if registers are exhausted)
4381@end itemize
4382
4383The implementation remark:
4384@itemize
4385@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
4386 using the C library @code{printf} functions and the Fortran
4387 @code{print}/@code{write} statements.
4388@end itemize
4389
4390
4391
4392@node nvptx
4393@section nvptx
4394
4395On the hardware side, there is the hierarchy (fine to coarse):
4396@itemize
4397@item thread
4398@item warp
4399@item thread block
4400@item streaming multiprocessor
4401@end itemize
4402
4403All OpenMP and OpenACC levels are used, i.e.
4404@itemize
4405@item OpenMP's simd and OpenACC's vector map to threads
4406@item OpenMP's threads (``parallel'') and OpenACC's workers map to warps
4407@item OpenMP's teams and OpenACC's gang use a threadpool with the
4408 size of the number of teams or gangs, respectively.
4409@end itemize
4410
4411The used sizes are
4412@itemize
4413@item The @code{warp_size} is always 32
4414@item CUDA kernel launched: @code{dim=@{#teams,1,1@}, blocks=@{#threads,warp_size,1@}}.
4415@end itemize
4416
4417Additional information can be obtained by setting the environment variable to
4418@code{GOMP_DEBUG=1} (very verbose; grep for @code{kernel.*launch} for launch
4419parameters).
4420
4421GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA,
4422which caches the JIT in the user's directory (see CUDA documentation; can be
4423tuned by the environment variables @code{CUDA_CACHE_@{DISABLE,MAXSIZE,PATH@}}.
4424
4425Note: While PTX ISA is generic, the @code{-mptx=} and @code{-march=} commandline
4426options still affect the used PTX ISA code and, thus, the requirments on
4427CUDA version and hardware.
4428
4429The implementation remark:
4430@itemize
4431@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported
4432 using the C library @code{printf} functions. Note that the Fortran
4433 @code{print}/@code{write} statements are not supported, yet.
4434@item Compilation OpenMP code that contains @code{requires reverse_offload}
4435 requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
4436 is not supported.
4437@end itemize
4438
4439
4440@c ---------------------------------------------------------------------
4441@c The libgomp ABI
4442@c ---------------------------------------------------------------------
4443
4444@node The libgomp ABI
4445@chapter The libgomp ABI
4446
4447The following sections present notes on the external ABI as
4448presented by libgomp. Only maintainers should need them.
4449
4450@menu
4451* Implementing MASTER construct::
4452* Implementing CRITICAL construct::
4453* Implementing ATOMIC construct::
4454* Implementing FLUSH construct::
4455* Implementing BARRIER construct::
4456* Implementing THREADPRIVATE construct::
4457* Implementing PRIVATE clause::
4458* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
4459* Implementing REDUCTION clause::
4460* Implementing PARALLEL construct::
4461* Implementing FOR construct::
4462* Implementing ORDERED construct::
4463* Implementing SECTIONS construct::
4464* Implementing SINGLE construct::
4465* Implementing OpenACC's PARALLEL construct::
4466@end menu
4467
4468
4469@node Implementing MASTER construct
4470@section Implementing MASTER construct
4471
4472@smallexample
4473if (omp_get_thread_num () == 0)
4474 block
4475@end smallexample
4476
4477Alternately, we generate two copies of the parallel subfunction
4478and only include this in the version run by the primary thread.
4479Surely this is not worthwhile though...
4480
4481
4482
4483@node Implementing CRITICAL construct
4484@section Implementing CRITICAL construct
4485
4486Without a specified name,
4487
4488@smallexample
4489 void GOMP_critical_start (void);
4490 void GOMP_critical_end (void);
4491@end smallexample
4492
4493so that we don't get COPY relocations from libgomp to the main
4494application.
4495
4496With a specified name, use omp_set_lock and omp_unset_lock with
4497name being transformed into a variable declared like
4498
4499@smallexample
4500 omp_lock_t gomp_critical_user_<name> __attribute__((common))
4501@end smallexample
4502
4503Ideally the ABI would specify that all zero is a valid unlocked
4504state, and so we wouldn't need to initialize this at
4505startup.
4506
4507
4508
4509@node Implementing ATOMIC construct
4510@section Implementing ATOMIC construct
4511
4512The target should implement the @code{__sync} builtins.
4513
4514Failing that we could add
4515
4516@smallexample
4517 void GOMP_atomic_enter (void)
4518 void GOMP_atomic_exit (void)
4519@end smallexample
4520
4521which reuses the regular lock code, but with yet another lock
4522object private to the library.
4523
4524
4525
4526@node Implementing FLUSH construct
4527@section Implementing FLUSH construct
4528
4529Expands to the @code{__sync_synchronize} builtin.
4530
4531
4532
4533@node Implementing BARRIER construct
4534@section Implementing BARRIER construct
4535
4536@smallexample
4537 void GOMP_barrier (void)
4538@end smallexample
4539
4540
4541@node Implementing THREADPRIVATE construct
4542@section Implementing THREADPRIVATE construct
4543
4544In _most_ cases we can map this directly to @code{__thread}. Except
4545that OMP allows constructors for C++ objects. We can either
4546refuse to support this (how often is it used?) or we can
4547implement something akin to .ctors.
4548
4549Even more ideally, this ctor feature is handled by extensions
4550to the main pthreads library. Failing that, we can have a set
4551of entry points to register ctor functions to be called.
4552
4553
4554
4555@node Implementing PRIVATE clause
4556@section Implementing PRIVATE clause
4557
4558In association with a PARALLEL, or within the lexical extent
4559of a PARALLEL block, the variable becomes a local variable in
4560the parallel subfunction.
4561
4562In association with FOR or SECTIONS blocks, create a new
4563automatic variable within the current function. This preserves
4564the semantic of new variable creation.
4565
4566
4567
4568@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
4569@section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
4570
4571This seems simple enough for PARALLEL blocks. Create a private
4572struct for communicating between the parent and subfunction.
4573In the parent, copy in values for scalar and "small" structs;
4574copy in addresses for others TREE_ADDRESSABLE types. In the
4575subfunction, copy the value into the local variable.
4576
4577It is not clear what to do with bare FOR or SECTION blocks.
4578The only thing I can figure is that we do something like:
4579
4580@smallexample
4581#pragma omp for firstprivate(x) lastprivate(y)
4582for (int i = 0; i < n; ++i)
4583 body;
4584@end smallexample
4585
4586which becomes
4587
4588@smallexample
4589@{
4590 int x = x, y;
4591
4592 // for stuff
4593
4594 if (i == n)
4595 y = y;
4596@}
4597@end smallexample
4598
4599where the "x=x" and "y=y" assignments actually have different
4600uids for the two variables, i.e. not something you could write
4601directly in C. Presumably this only makes sense if the "outer"
4602x and y are global variables.
4603
4604COPYPRIVATE would work the same way, except the structure
4605broadcast would have to happen via SINGLE machinery instead.
4606
4607
4608
4609@node Implementing REDUCTION clause
4610@section Implementing REDUCTION clause
4611
4612The private struct mentioned in the previous section should have
4613a pointer to an array of the type of the variable, indexed by the
4614thread's @var{team_id}. The thread stores its final value into the
4615array, and after the barrier, the primary thread iterates over the
4616array to collect the values.
4617
4618
4619@node Implementing PARALLEL construct
4620@section Implementing PARALLEL construct
4621
4622@smallexample
4623 #pragma omp parallel
4624 @{
4625 body;
4626 @}
4627@end smallexample
4628
4629becomes
4630
4631@smallexample
4632 void subfunction (void *data)
4633 @{
4634 use data;
4635 body;
4636 @}
4637
4638 setup data;
4639 GOMP_parallel_start (subfunction, &data, num_threads);
4640 subfunction (&data);
4641 GOMP_parallel_end ();
4642@end smallexample
4643
4644@smallexample
4645 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
4646@end smallexample
4647
4648The @var{FN} argument is the subfunction to be run in parallel.
4649
4650The @var{DATA} argument is a pointer to a structure used to
4651communicate data in and out of the subfunction, as discussed
4652above with respect to FIRSTPRIVATE et al.
4653
4654The @var{NUM_THREADS} argument is 1 if an IF clause is present
4655and false, or the value of the NUM_THREADS clause, if
4656present, or 0.
4657
4658The function needs to create the appropriate number of
4659threads and/or launch them from the dock. It needs to
4660create the team structure and assign team ids.
4661
4662@smallexample
4663 void GOMP_parallel_end (void)
4664@end smallexample
4665
4666Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
4667
4668
4669
4670@node Implementing FOR construct
4671@section Implementing FOR construct
4672
4673@smallexample
4674 #pragma omp parallel for
4675 for (i = lb; i <= ub; i++)
4676 body;
4677@end smallexample
4678
4679becomes
4680
4681@smallexample
4682 void subfunction (void *data)
4683 @{
4684 long _s0, _e0;
4685 while (GOMP_loop_static_next (&_s0, &_e0))
4686 @{
4687 long _e1 = _e0, i;
4688 for (i = _s0; i < _e1; i++)
4689 body;
4690 @}
4691 GOMP_loop_end_nowait ();
4692 @}
4693
4694 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
4695 subfunction (NULL);
4696 GOMP_parallel_end ();
4697@end smallexample
4698
4699@smallexample
4700 #pragma omp for schedule(runtime)
4701 for (i = 0; i < n; i++)
4702 body;
4703@end smallexample
4704
4705becomes
4706
4707@smallexample
4708 @{
4709 long i, _s0, _e0;
4710 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
4711 do @{
4712 long _e1 = _e0;
4713 for (i = _s0, i < _e0; i++)
4714 body;
4715 @} while (GOMP_loop_runtime_next (&_s0, _&e0));
4716 GOMP_loop_end ();
4717 @}
4718@end smallexample
4719
4720Note that while it looks like there is trickiness to propagating
4721a non-constant STEP, there isn't really. We're explicitly allowed
4722to evaluate it as many times as we want, and any variables involved
4723should automatically be handled as PRIVATE or SHARED like any other
4724variables. So the expression should remain evaluable in the
4725subfunction. We can also pull it into a local variable if we like,
4726but since its supposed to remain unchanged, we can also not if we like.
4727
4728If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
4729able to get away with no work-sharing context at all, since we can
4730simply perform the arithmetic directly in each thread to divide up
4731the iterations. Which would mean that we wouldn't need to call any
4732of these routines.
4733
4734There are separate routines for handling loops with an ORDERED
4735clause. Bookkeeping for that is non-trivial...
4736
4737
4738
4739@node Implementing ORDERED construct
4740@section Implementing ORDERED construct
4741
4742@smallexample
4743 void GOMP_ordered_start (void)
4744 void GOMP_ordered_end (void)
4745@end smallexample
4746
4747
4748
4749@node Implementing SECTIONS construct
4750@section Implementing SECTIONS construct
4751
4752A block as
4753
4754@smallexample
4755 #pragma omp sections
4756 @{
4757 #pragma omp section
4758 stmt1;
4759 #pragma omp section
4760 stmt2;
4761 #pragma omp section
4762 stmt3;
4763 @}
4764@end smallexample
4765
4766becomes
4767
4768@smallexample
4769 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
4770 switch (i)
4771 @{
4772 case 1:
4773 stmt1;
4774 break;
4775 case 2:
4776 stmt2;
4777 break;
4778 case 3:
4779 stmt3;
4780 break;
4781 @}
4782 GOMP_barrier ();
4783@end smallexample
4784
4785
4786@node Implementing SINGLE construct
4787@section Implementing SINGLE construct
4788
4789A block like
4790
4791@smallexample
4792 #pragma omp single
4793 @{
4794 body;
4795 @}
4796@end smallexample
4797
4798becomes
4799
4800@smallexample
4801 if (GOMP_single_start ())
4802 body;
4803 GOMP_barrier ();
4804@end smallexample
4805
4806while
4807
4808@smallexample
4809 #pragma omp single copyprivate(x)
4810 body;
4811@end smallexample
4812
4813becomes
4814
4815@smallexample
4816 datap = GOMP_single_copy_start ();
4817 if (datap == NULL)
4818 @{
4819 body;
4820 data.x = x;
4821 GOMP_single_copy_end (&data);
4822 @}
4823 else
4824 x = datap->x;
4825 GOMP_barrier ();
4826@end smallexample
4827
4828
4829
4830@node Implementing OpenACC's PARALLEL construct
4831@section Implementing OpenACC's PARALLEL construct
4832
4833@smallexample
4834 void GOACC_parallel ()
4835@end smallexample
4836
4837
4838
4839@c ---------------------------------------------------------------------
4840@c Reporting Bugs
4841@c ---------------------------------------------------------------------
4842
4843@node Reporting Bugs
4844@chapter Reporting Bugs
4845
4846Bugs in the GNU Offloading and Multi Processing Runtime Library should
4847be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
4848"openacc", or "openmp", or both to the keywords field in the bug
4849report, as appropriate.
4850
4851
4852
4853@c ---------------------------------------------------------------------
4854@c GNU General Public License
4855@c ---------------------------------------------------------------------
4856
4857@include gpl_v3.texi
4858
4859
4860
4861@c ---------------------------------------------------------------------
4862@c GNU Free Documentation License
4863@c ---------------------------------------------------------------------
4864
4865@include fdl.texi
4866
4867
4868
4869@c ---------------------------------------------------------------------
4870@c Funding Free Software
4871@c ---------------------------------------------------------------------
4872
4873@include funding.texi
4874
4875@c ---------------------------------------------------------------------
4876@c Index
4877@c ---------------------------------------------------------------------
4878
4879@node Library Index
4880@unnumbered Library Index
4881
4882@printindex cp
4883
4884@bye