@copying
-Copyright @copyright{} 2006-2014 Free Software Foundation, Inc.
+Copyright @copyright{} 2006-2019 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
@ifinfo
@dircategory GNU Libraries
@direntry
-* libgomp: (libgomp). GNU OpenMP runtime library
+* libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
@end direntry
-This manual documents the GNU implementation of the OpenMP API for
-multi-platform shared-memory parallel programming in C/C++ and Fortran.
+This manual documents libgomp, the GNU Offloading and Multi Processing
+Runtime library. This is the GNU implementation of the OpenMP and
+OpenACC APIs for parallel and accelerator programming in C/C++ and
+Fortran.
Published by the Free Software Foundation
51 Franklin Street, Fifth Floor
@setchapternewpage odd
@titlepage
-@title The GNU OpenMP Implementation
+@title GNU Offloading and Multi Processing Runtime Library
+@subtitle The GNU OpenMP and OpenACC Implementation
@page
@vskip 0pt plus 1filll
@comment For the @value{version-GCC} Version*
@top Introduction
@cindex Introduction
-This manual documents the usage of libgomp, the GNU implementation of the
-@uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
-for multi-platform shared-memory parallel programming in C/C++ and Fortran.
+This manual documents the usage of libgomp, the GNU Offloading and
+Multi Processing Runtime Library. This includes the GNU
+implementation of the @uref{https://www.openmp.org, OpenMP} Application
+Programming Interface (API) for multi-platform shared-memory parallel
+programming in C/C++ and Fortran, and the GNU implementation of the
+@uref{https://www.openacc.org, OpenACC} Application Programming
+Interface (API) for offloading of code to accelerator devices in C/C++
+and Fortran.
+
+Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
+on this, support for OpenACC and offloading (both OpenACC and OpenMP
+4's target construct) has been added later on, and the library's name
+changed to GNU Offloading and Multi Processing Runtime Library.
@comment
@menu
* Enabling OpenMP:: How to enable OpenMP for your applications.
-* Runtime Library Routines:: The OpenMP runtime application programming
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+ The OpenMP runtime application programming
interface.
-* Environment Variables:: Influencing runtime behavior with environment
- variables.
+* OpenMP Environment Variables: Environment Variables.
+ Influencing OpenMP runtime behavior with
+ environment variables.
+* Enabling OpenACC:: How to enable OpenACC for your
+ applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+ programming interface.
+* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
+ environment variables.
+* CUDA Streams Usage:: Notes on the implementation of
+ asynchronous operations.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+ NVIDIA CUBLAS library.
+* OpenACC Profiling Interface::
* The libgomp ABI:: Notes on the external ABI presented by libgomp.
-* Reporting Bugs:: How to report bugs in GNU OpenMP.
+* Reporting Bugs:: How to report bugs in the GNU Offloading and
+ Multi Processing Runtime Library.
* Copying:: GNU general public license says
how you can copy and share libgomp.
* GNU Free Documentation License::
(@ref{Runtime Library Routines}).
A complete description of all OpenMP directives accepted may be found in
-the @uref{http://www.openmp.org, OpenMP Application Program Interface} manual,
-version 4.0.
+the @uref{https://www.openmp.org, OpenMP Application Program Interface} manual,
+version 4.5.
@c ---------------------------------------------------------------------
-@c Runtime Library Routines
+@c OpenMP Runtime Library Routines
@c ---------------------------------------------------------------------
@node Runtime Library Routines
-@chapter Runtime Library Routines
+@chapter OpenMP Runtime Library Routines
The runtime routines described here are defined by Section 3 of the OpenMP
-specification in version 4.0. The routines are structured in following
+specification in version 4.5. The routines are structured in following
three parts:
+@menu
Control threads, processors and the parallel environment. They have C
linkage, and do not throw exceptions.
-@menu
* omp_get_active_level:: Number of active parallel regions
* omp_get_ancestor_thread_num:: Ancestor thread ID
* omp_get_cancellation:: Whether cancellation support is enabled
* omp_get_dynamic:: Dynamic teams setting
* omp_get_level:: Number of parallel regions
* omp_get_max_active_levels:: Maximum number of active regions
+* omp_get_max_task_priority:: Maximum task priority value that can be set
* omp_get_max_threads:: Maximum number of threads of parallel region
* omp_get_nested:: Nested parallel regions
* omp_get_num_devices:: Number of target devices
* omp_set_nested:: Enable/disable nested parallel regions
* omp_set_num_threads:: Set upper team size limit
* omp_set_schedule:: Set the runtime scheduling method
-@end menu
Initialize, set, test, unset and destroy simple and nested locks.
-@menu
* omp_init_lock:: Initialize simple lock
* omp_set_lock:: Wait for and set simple lock
* omp_test_lock:: Test and set simple lock if available
* omp_test_nest_lock:: Test and set nested lock if available
* omp_unset_nest_lock:: Unset nested lock
* omp_destroy_nest_lock:: Destroy nested lock
-@end menu
Portable, thread-based, wall clock timer.
-@menu
* omp_get_wtick:: Get timer precision.
* omp_get_wtime:: Elapsed wall clock time.
@end menu
@ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.20.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
@end table
@ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.18.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
@end table
@ref{OMP_CANCELLATION}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.9.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
@end table
@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.24.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
@end table
@ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.8.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
@end table
@ref{omp_get_active_level}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.17.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
@end table
@ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.16.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
@end table
+@node omp_get_max_task_priority
+@section @code{omp_get_max_task_priority} -- Maximum priority value
+that can be set for tasks.
+@table @asis
+@item @emph{Description}:
+This function obtains the maximum allowed priority number for tasks.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
+@end table
+
@node omp_get_max_threads
@section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
@ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.3.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
@end table
@ref{omp_set_nested}, @ref{OMP_NESTED}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.11.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
@end table
@end multitable
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.25.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
@end table
@end multitable
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.5.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
@end table
@end multitable
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.26.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
@end table
@ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.2.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
@end table
@ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.22.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
@end table
Obtain the runtime scheduling method. The @var{kind} argument will be
set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
@code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
-@var{modifier}, is set to the chunk size.
+@var{chunk_size}, is set to the chunk size.
@item @emph{C/C++}
@multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *modifier);}
+@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
@end multitable
@item @emph{Fortran}:
@multitable @columnfractions .20 .80
-@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, modifier)}
+@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
@item @tab @code{integer(kind=omp_sched_kind) kind}
-@item @tab @code{integer modifier}
+@item @tab @code{integer chunk_size}
@end multitable
@item @emph{See also}:
@ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.13.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
@end table
@end multitable
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.27.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
@end table
@ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.19.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
@end table
@ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.14.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
@end table
@ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.4.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
@end table
@end multitable
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.6.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
@end table
@end multitable
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.21.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
@end table
@end multitable
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.28.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
@end table
@ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.23.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
@end table
@ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.7.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
@end table
@ref{omp_get_max_active_levels}, @ref{omp_get_active_level}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.15.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
@end table
@ref{OMP_NESTED}, @ref{omp_get_nested}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.10.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
@end table
@ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.1.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
@end table
value @code{omp_sched_static}, @code{omp_sched_dynamic},
@code{omp_sched_guided} or @code{omp_sched_auto}. Except for
@code{omp_sched_auto}, the chunk size is set to the value of
-@var{modifier} if positive, or to the default value if zero or negative.
-For @code{omp_sched_auto} the @var{modifier} argument is ignored.
+@var{chunk_size} if positive, or to the default value if zero or negative.
+For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
@item @emph{C/C++}
@multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int modifier);}
+@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
@end multitable
@item @emph{Fortran}:
@multitable @columnfractions .20 .80
-@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, modifier)}
+@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
@item @tab @code{integer(kind=omp_sched_kind) kind}
-@item @tab @code{integer modifier}
+@item @tab @code{integer chunk_size}
@end multitable
@item @emph{See also}:
@ref{OMP_SCHEDULE}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.12.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
@end table
@ref{omp_destroy_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.1.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
@end table
@ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.3.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
@end table
@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.5.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
@end table
@ref{omp_set_lock}, @ref{omp_test_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.4.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
@end table
@ref{omp_init_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.2.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
@end table
@ref{omp_destroy_nest_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.1.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
@end table
@ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.3.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
@end table
@ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.5.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
@end table
@ref{omp_set_nest_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.4.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
@end table
@ref{omp_init_lock}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.2.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
@end table
@ref{omp_get_wtime}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.4.2.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
@end table
@ref{omp_get_wtick}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.4.1.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
@end table
@c ---------------------------------------------------------------------
-@c Environment Variables
+@c OpenMP Environment Variables
@c ---------------------------------------------------------------------
@node Environment Variables
-@chapter Environment Variables
+@chapter OpenMP Environment Variables
The environment variables which beginning with @env{OMP_} are defined by
-section 4 of the OpenMP specification in version 4.0, while those
+section 4 of the OpenMP specification in version 4.5, while those
beginning with @env{GOMP_} are GNU extensions.
@menu
-* OMP_CANCELLATION:: Set whether cancellation is activated
-* OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
-* OMP_DEFAULT_DEVICE:: Set the device used in target regions
-* OMP_DYNAMIC:: Dynamic adjustment of threads
-* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
-* OMP_NESTED:: Nested parallel regions
-* OMP_NUM_THREADS:: Specifies the number of threads to use
-* OMP_PROC_BIND:: Whether theads may be moved between CPUs
-* OMP_PLACES:: Specifies on which CPUs the theads should be placed
-* OMP_STACKSIZE:: Set default thread stack size
-* OMP_SCHEDULE:: How threads are scheduled
-* OMP_THREAD_LIMIT:: Set the maximum number of threads
-* OMP_WAIT_POLICY:: How waiting threads are handled
-* GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
-* GOMP_STACKSIZE:: Set default thread stack size
-* GOMP_SPINCOUNT:: Set the busy-wait spin count
+* OMP_CANCELLATION:: Set whether cancellation is activated
+* OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
+* OMP_DEFAULT_DEVICE:: Set the device used in target regions
+* OMP_DYNAMIC:: Dynamic adjustment of threads
+* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
+* OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
+* OMP_NESTED:: Nested parallel regions
+* OMP_NUM_THREADS:: Specifies the number of threads to use
+* OMP_PROC_BIND:: Whether theads may be moved between CPUs
+* OMP_PLACES:: Specifies on which CPUs the theads should be placed
+* OMP_STACKSIZE:: Set default thread stack size
+* OMP_SCHEDULE:: How threads are scheduled
+* OMP_THREAD_LIMIT:: Set the maximum number of threads
+* OMP_WAIT_POLICY:: How waiting threads are handled
+* GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
+* GOMP_DEBUG:: Enable debugging output
+* GOMP_STACKSIZE:: Set default thread stack size
+* GOMP_SPINCOUNT:: Set the busy-wait spin count
+* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
@end menu
@ref{omp_get_cancellation}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.11
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
@end table
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.12
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
@end table
@ref{omp_get_default_device}, @ref{omp_set_default_device},
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.11
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
@end table
@ref{omp_set_dynamic}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.3
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
@end table
@ref{omp_set_max_active_levels}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.9
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
+@end table
+
+
+
+@node OMP_MAX_TASK_PRIORITY
+@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
+number that can be set for a task.
+@cindex Environment Variable
+@table @asis
+@item @emph{Description}:
+Specifies the initial value for the maximum priority value that can be
+set for a task. The value of this variable shall be a non-negative
+integer, and zero is allowed. If undefined, the default priority is
+0.
+
+@item @emph{See also}:
+@ref{omp_get_max_task_priority}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
@end table
@ref{omp_set_nested}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.6
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
@end table
@ref{omp_set_num_threads}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.2
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
+@end table
+
+
+
+@node OMP_PROC_BIND
+@section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
+@cindex Environment Variable
+@table @asis
+@item @emph{Description}:
+Specifies whether threads may be moved between processors. If set to
+@code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
+they may be moved. Alternatively, a comma separated list with the
+values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
+the thread affinity policy for the corresponding nesting level. With
+@code{MASTER} the worker threads are in the same place partition as the
+master thread. With @code{CLOSE} those are kept close to the master thread
+in contiguous place partitions. And with @code{SPREAD} a sparse distribution
+across the place partitions is used.
+
+When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
+@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
+
+@item @emph{See also}:
+@ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
@end table
@ref{OMP_DISPLAY_ENV}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.5
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
@end table
-@node OMP_PROC_BIND
-@section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
+@node OMP_STACKSIZE
+@section @env{OMP_STACKSIZE} -- Set default thread stack size
@cindex Environment Variable
@table @asis
@item @emph{Description}:
-Specifies whether threads may be moved between processors. If set to
-@code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
-they may be moved. Alternatively, a comma separated list with the
-values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
-the thread affinity policy for the corresponding nesting level. With
-@code{MASTER} the worker threads are in the same place partition as the
-master thread. With @code{CLOSE} those are kept close to the master thread
-in contiguous place partitions. And with @code{SPREAD} a sparse distribution
-across the place partitions is used.
-
-When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
-@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
-
-@item @emph{See also}:
-@ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
+Set the default thread stack size in kilobytes, unless the number
+is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
+case the size is, respectively, in bytes, kilobytes, megabytes
+or gigabytes. This is different from @code{pthread_attr_setstacksize}
+which gets the number of bytes as an argument. If the stack size cannot
+be set due to system constraints, an error is reported and the initial
+stack size is left unchanged. If undefined, the stack size is system
+dependent.
-@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.4
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
@end table
@ref{omp_set_schedule}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Sections 2.7.1 and 4.1
-@end table
-
-
-
-@node OMP_STACKSIZE
-@section @env{OMP_STACKSIZE} -- Set default thread stack size
-@cindex Environment Variable
-@table @asis
-@item @emph{Description}:
-Set the default thread stack size in kilobytes, unless the number
-is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
-case the size is, respectively, in bytes, kilobytes, megabytes
-or gigabytes. This is different from @code{pthread_attr_setstacksize}
-which gets the number of bytes as an argument. If the stack size cannot
-be set due to system constraints, an error is reported and the initial
-stack size is left unchanged. If undefined, the stack size is system
-dependent.
-
-@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.7
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
@end table
@ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.10
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
@end table
@ref{GOMP_SPINCOUNT}
@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.8
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
@end table
and 14 respectively and then start assigning back from the beginning of
the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
-There is no GNU OpenMP library routine to determine whether a CPU affinity
+There is no libgomp library routine to determine whether a CPU affinity
specification is in effect. As a workaround, language-specific library
functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
+@node GOMP_DEBUG
+@section @env{GOMP_DEBUG} -- Enable debugging output
+@cindex Environment Variable
+@table @asis
+@item @emph{Description}:
+Enable debugging output. The variable should be set to @code{0}
+(disabled, also the default if not set), or @code{1} (enabled).
+
+If enabled, some debugging output will be printed during execution.
+This is currently not specified in more detail, and subject to change.
+@end table
+
+
+
@node GOMP_STACKSIZE
@section @env{GOMP_STACKSIZE} -- Set default thread stack size
@cindex Environment Variable
+@node GOMP_RTEMS_THREAD_POOLS
+@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
+@cindex Environment Variable
+@cindex Implementation specific setting
+@table @asis
+@item @emph{Description}:
+This environment variable is only used on the RTEMS real-time operating system.
+It determines the scheduler instance specific thread pools. The format for
+@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
+@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
+separated by @code{:} where:
+@itemize @bullet
+@item @code{<thread-pool-count>} is the thread pool count for this scheduler
+instance.
+@item @code{$<priority>} is an optional priority for the worker threads of a
+thread pool according to @code{pthread_setschedparam}. In case a priority
+value is omitted, then a worker thread will inherit the priority of the OpenMP
+master thread that created it. The priority of the worker thread is not
+changed after creation, even if a new OpenMP master thread using the worker has
+a different priority.
+@item @code{@@<scheduler-name>} is the scheduler instance name according to the
+RTEMS application configuration.
+@end itemize
+In case no thread pool configuration is specified for a scheduler instance,
+then each OpenMP master thread of this scheduler instance will use its own
+dynamically allocated thread pool. To limit the worker thread count of the
+thread pools, each OpenMP master thread must call @code{omp_set_num_threads}.
+@item @emph{Example}:
+Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
+@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
+@code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
+scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
+one thread pool available. Since no priority is specified for this scheduler
+instance, the worker thread inherits the priority of the OpenMP master thread
+that created it. In the scheduler instance @code{WRK1} there are three thread
+pools available and their worker threads run at priority four.
+@end table
+
+
+
@c ---------------------------------------------------------------------
-@c The libgomp ABI
+@c Enabling OpenACC
@c ---------------------------------------------------------------------
-@node The libgomp ABI
-@chapter The libgomp ABI
+@node Enabling OpenACC
+@chapter Enabling OpenACC
-The following sections present notes on the external ABI as
-presented by libgomp. Only maintainers should need them.
+To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
+flag @option{-fopenacc} must be specified. This enables the OpenACC directive
+@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
+@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
+@code{!$} conditional compilation sentinels in free form and @code{c$},
+@code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
+arranges for automatic linking of the OpenACC runtime library
+(@ref{OpenACC Runtime Library Routines}).
-@menu
-* Implementing MASTER construct::
-* Implementing CRITICAL construct::
-* Implementing ATOMIC construct::
-* Implementing FLUSH construct::
-* Implementing BARRIER construct::
-* Implementing THREADPRIVATE construct::
-* Implementing PRIVATE clause::
-* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
-* Implementing REDUCTION clause::
-* Implementing PARALLEL construct::
-* Implementing FOR construct::
-* Implementing ORDERED construct::
-* Implementing SECTIONS construct::
-* Implementing SINGLE construct::
-@end menu
+A complete description of all OpenACC directives accepted may be found in
+the @uref{https://www.openacc.org, OpenACC} Application Programming
+Interface manual, version 2.0.
+Note that this is an experimental feature and subject to
+change in future versions of GCC. See
+@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
-@node Implementing MASTER construct
-@section Implementing MASTER construct
-@smallexample
-if (omp_get_thread_num () == 0)
- block
-@end smallexample
-Alternately, we generate two copies of the parallel subfunction
-and only include this in the version run by the master thread.
-Surely this is not worthwhile though...
+@c ---------------------------------------------------------------------
+@c OpenACC Runtime Library Routines
+@c ---------------------------------------------------------------------
+@node OpenACC Runtime Library Routines
+@chapter OpenACC Runtime Library Routines
+The runtime routines described here are defined by section 3 of the OpenACC
+specifications in version 2.0.
+They have C linkage, and do not throw exceptions.
+Generally, they are available only for the host, with the exception of
+@code{acc_on_device}, which is available for both the host and the
+acceleration device.
-@node Implementing CRITICAL construct
-@section Implementing CRITICAL construct
+@menu
+* acc_get_num_devices:: Get number of devices for the given device
+ type.
+* acc_set_device_type:: Set type of device accelerator to use.
+* acc_get_device_type:: Get type of device accelerator to be used.
+* acc_set_device_num:: Set device number to use.
+* acc_get_device_num:: Get device number to be used.
+* acc_async_test:: Tests for completion of a specific asynchronous
+ operation.
+* acc_async_test_all:: Tests for completion of all asychronous
+ operations.
+* acc_wait:: Wait for completion of a specific asynchronous
+ operation.
+* acc_wait_all:: Waits for completion of all asyncrhonous
+ operations.
+* acc_wait_all_async:: Wait for completion of all asynchronous
+ operations.
+* acc_wait_async:: Wait for completion of asynchronous operations.
+* acc_init:: Initialize runtime for a specific device type.
+* acc_shutdown:: Shuts down the runtime for a specific device
+ type.
+* acc_on_device:: Whether executing on a particular device
+* acc_malloc:: Allocate device memory.
+* acc_free:: Free device memory.
+* acc_copyin:: Allocate device memory and copy host memory to
+ it.
+* acc_present_or_copyin:: If the data is not present on the device,
+ allocate device memory and copy from host
+ memory.
+* acc_create:: Allocate device memory and map it to host
+ memory.
+* acc_present_or_create:: If the data is not present on the device,
+ allocate device memory and map it to host
+ memory.
+* acc_copyout:: Copy device memory to host memory.
+* acc_delete:: Free device memory.
+* acc_update_device:: Update device memory from mapped host memory.
+* acc_update_self:: Update host memory from mapped device memory.
+* acc_map_data:: Map previously allocated device memory to host
+ memory.
+* acc_unmap_data:: Unmap device memory from host memory.
+* acc_deviceptr:: Get device pointer associated with specific
+ host address.
+* acc_hostptr:: Get host pointer associated with specific
+ device address.
+* acc_is_present:: Indiciate whether host variable / array is
+ present on device.
+* acc_memcpy_to_device:: Copy host memory to device memory.
+* acc_memcpy_from_device:: Copy device memory to host memory.
+
+API routines for target platforms.
+
+* acc_get_current_cuda_device:: Get CUDA device handle.
+* acc_get_current_cuda_context::Get CUDA context handle.
+* acc_get_cuda_stream:: Get CUDA stream handle.
+* acc_set_cuda_stream:: Set CUDA stream handle.
+
+API routines for the OpenACC Profiling Interface.
+
+* acc_prof_register:: Register callbacks.
+* acc_prof_unregister:: Unregister callbacks.
+* acc_prof_lookup:: Obtain inquiry functions.
+* acc_register_library:: Library registration.
+@end menu
-Without a specified name,
-@smallexample
- void GOMP_critical_start (void);
- void GOMP_critical_end (void);
-@end smallexample
-so that we don't get COPY relocations from libgomp to the main
-application.
+@node acc_get_num_devices
+@section @code{acc_get_num_devices} -- Get number of devices for given device type
+@table @asis
+@item @emph{Description}
+This function returns a value indicating the number of devices available
+for the device type specified in @var{devicetype}.
-With a specified name, use omp_set_lock and omp_unset_lock with
-name being transformed into a variable declared like
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
+@end multitable
-@smallexample
- omp_lock_t gomp_critical_user_<name> __attribute__((common))
-@end smallexample
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
+@item @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
-Ideally the ABI would specify that all zero is a valid unlocked
-state, and so we wouldn't need to initialize this at
-startup.
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.1.
+@end table
-@node Implementing ATOMIC construct
-@section Implementing ATOMIC construct
+@node acc_set_device_type
+@section @code{acc_set_device_type} -- Set type of device accelerator to use.
+@table @asis
+@item @emph{Description}
+This function indicates to the runtime library which device typr, specified
+in @var{devicetype}, to use when executing a parallel or kernels region.
-The target should implement the @code{__sync} builtins.
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
+@end multitable
-Failing that we could add
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
+@item @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
-@smallexample
- void GOMP_atomic_enter (void)
- void GOMP_atomic_exit (void)
-@end smallexample
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.2.
+@end table
-which reuses the regular lock code, but with yet another lock
-object private to the library.
+@node acc_get_device_type
+@section @code{acc_get_device_type} -- Get type of device accelerator to be used.
+@table @asis
+@item @emph{Description}
+This function returns what device type will be used when executing a
+parallel or kernels region.
-@node Implementing FLUSH construct
-@section Implementing FLUSH construct
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
+@end multitable
-Expands to the @code{__sync_synchronize} builtin.
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
+@item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
+@end multitable
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.3.
+@end table
-@node Implementing BARRIER construct
-@section Implementing BARRIER construct
-@smallexample
- void GOMP_barrier (void)
-@end smallexample
+@node acc_set_device_num
+@section @code{acc_set_device_num} -- Set device number to use.
+@table @asis
+@item @emph{Description}
+This function will indicate to the runtime which device number,
+specified by @var{num}, associated with the specifed device
+type @var{devicetype}.
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_device_num(int num, acc_device_t devicetype);}
+@end multitable
-@node Implementing THREADPRIVATE construct
-@section Implementing THREADPRIVATE construct
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
+@item @tab @code{integer devicenum}
+@item @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
-In _most_ cases we can map this directly to @code{__thread}. Except
-that OMP allows constructors for C++ objects. We can either
-refuse to support this (how often is it used?) or we can
-implement something akin to .ctors.
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.4.
+@end table
-Even more ideally, this ctor feature is handled by extensions
-to the main pthreads library. Failing that, we can have a set
-of entry points to register ctor functions to be called.
+@node acc_get_device_num
+@section @code{acc_get_device_num} -- Get device number to be used.
+@table @asis
+@item @emph{Description}
+This function returns which device number associated with the specified device
+type @var{devicetype}, will be used when executing a parallel or kernels
+region.
-@node Implementing PRIVATE clause
-@section Implementing PRIVATE clause
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
+@end multitable
-In association with a PARALLEL, or within the lexical extent
-of a PARALLEL block, the variable becomes a local variable in
-the parallel subfunction.
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
+@item @tab @code{integer(kind=acc_device_kind) devicetype}
+@item @tab @code{integer acc_get_device_num}
+@end multitable
-In association with FOR or SECTIONS blocks, create a new
-automatic variable within the current function. This preserves
-the semantic of new variable creation.
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.5.
+@end table
-@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
+@node acc_async_test
+@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
+@table @asis
+@item @emph{Description}
+This function tests for completion of the asynchrounous operation specified
+in @var{arg}. In C/C++, a non-zero value will be returned to indicate
+the specified asynchronous operation has completed. While Fortran will return
+a @code{true}. If the asynchrounous operation has not completed, C/C++ returns
+a zero and Fortran returns a @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_async_test(arg)}
+@item @tab @code{integer(kind=acc_handle_kind) arg}
+@item @tab @code{logical acc_async_test}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.6.
+@end table
+
+
+
+@node acc_async_test_all
+@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function tests for completion of all asynchrounous operations.
+In C/C++, a non-zero value will be returned to indicate all asynchronous
+operations have completed. While Fortran will return a @code{true}. If
+any asynchronous operation has not completed, C/C++ returns a zero and
+Fortran returns a @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_async_test()}
+@item @tab @code{logical acc_get_device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.7.
+@end table
+
+
+
+@node acc_wait
+@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
+@table @asis
+@item @emph{Description}
+This function waits for completion of the asynchronous operation
+specified in @var{arg}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait(arg);}
+@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
+@item @tab @code{integer(acc_handle_kind) arg}
+@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
+@item @tab @code{integer(acc_handle_kind) arg}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.8.
+@end table
+
+
+
+@node acc_wait_all
+@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function waits for the completion of all asynchronous operations.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_all(void);}
+@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
+@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.10.
+@end table
+
+
+
+@node acc_wait_all_async
+@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function enqueues a wait operation on the queue @var{async} for any
+and all asynchronous operations that have been previously enqueued on
+any queue.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
+@item @tab @code{integer(acc_handle_kind) async}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.11.
+@end table
+
+
+
+@node acc_wait_async
+@section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function enqueues a wait operation on queue @var{async} for any and all
+asynchronous operations enqueued on queue @var{arg}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
+@item @tab @code{integer(acc_handle_kind) arg, async}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.9.
+@end table
+
+
+
+@node acc_init
+@section @code{acc_init} -- Initialize runtime for a specific device type.
+@table @asis
+@item @emph{Description}
+This function initializes the runtime for the device type specified in
+@var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
+@item @tab @code{integer(acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.12.
+@end table
+
+
+
+@node acc_shutdown
+@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
+@table @asis
+@item @emph{Description}
+This function shuts down the runtime for the device type specified in
+@var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
+@item @tab @code{integer(acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.13.
+@end table
+
+
+
+@node acc_on_device
+@section @code{acc_on_device} -- Whether executing on a particular device
+@table @asis
+@item @emph{Description}:
+This function returns whether the program is executing on a particular
+device specified in @var{devicetype}. In C/C++ a non-zero value is
+returned to indicate the device is execiting on the specified device type.
+In Fortran, @code{true} will be returned. If the program is not executing
+on the specified device type C/C++ will return a zero, while Fortran will
+return @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
+@item @tab @code{integer(acc_device_kind) devicetype}
+@item @tab @code{logical acc_on_device}
+@end multitable
+
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.14.
+@end table
+
+
+
+@node acc_malloc
+@section @code{acc_malloc} -- Allocate device memory.
+@table @asis
+@item @emph{Description}
+This function allocates @var{len} bytes of device memory. It returns
+the device address of the allocated memory.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.15.
+@end table
+
+
+
+@node acc_free
+@section @code{acc_free} -- Free device memory.
+@table @asis
+@item @emph{Description}
+Free previously allocated device memory at the device address @code{a}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.16.
+@end table
+
+
+
+@node acc_copyin
+@section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
+@table @asis
+@item @emph{Description}
+In C/C++, this function allocates @var{len} bytes of device memory
+and maps it to the specified host address in @var{a}. The device
+address of the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a
+variable or array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.17.
+@end table
+
+
+
+@node acc_present_or_copyin
+@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
+@table @asis
+@item @emph{Description}
+This function tests if the host data specifed by @var{a} and of length
+@var{len} is present or not. If it is not present, then device memory
+will be allocated and the host memory copied. The device address of
+the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
+@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.18.
+@end table
+
+
+
+@node acc_create
+@section @code{acc_create} -- Allocate device memory and map it to host memory.
+@table @asis
+@item @emph{Description}
+This function allocates device memory and maps it to host memory specified
+by the host address @var{a} with a length of @var{len} bytes. In C/C++,
+the function returns the device address of the allocated device memory.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_create(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.19.
+@end table
+
+
+
+@node acc_present_or_create
+@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
+@table @asis
+@item @emph{Description}
+This function tests if the host data specifed by @var{a} and of length
+@var{len} is present or not. If it is not present, then device memory
+will be allocated and mapped to host memory. In C/C++, the device address
+of the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
+@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.20.
+@end table
+
+
+
+@node acc_copyout
+@section @code{acc_copyout} -- Copy device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function copies mapped device memory to host memory which is specified
+by host address @var{a} for a length @var{len} bytes in C/C++.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.21.
+@end table
+
+
+
+@node acc_delete
+@section @code{acc_delete} -- Free device memory.
+@table @asis
+@item @emph{Description}
+This function frees previously allocated device memory specified by
+the device address @var{a} and the length of @var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.22.
+@end table
+
+
+
+@node acc_update_device
+@section @code{acc_update_device} -- Update device memory from mapped host memory.
+@table @asis
+@item @emph{Description}
+This function updates the device copy from the previously mapped host memory.
+The host memory is specified with the host address @var{a} and a length of
+@var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.23.
+@end table
+
+
+
+@node acc_update_self
+@section @code{acc_update_self} -- Update host memory from mapped device memory.
+@table @asis
+@item @emph{Description}
+This function updates the host copy from the previously mapped device memory.
+The host memory is specified with the host address @var{a} and a length of
+@var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.24.
+@end table
+
+
+
+@node acc_map_data
+@section @code{acc_map_data} -- Map previously allocated device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function maps previously allocated device and host memory. The device
+memory is specified with the device address @var{d}. The host memory is
+specified with the host address @var{h} and a length of @var{len}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.25.
+@end table
+
+
+
+@node acc_unmap_data
+@section @code{acc_unmap_data} -- Unmap device memory from host memory.
+@table @asis
+@item @emph{Description}
+This function unmaps previously mapped device and host memory. The latter
+specified by @var{h}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.26.
+@end table
+
+
+
+@node acc_deviceptr
+@section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
+@table @asis
+@item @emph{Description}
+This function returns the device address that has been mapped to the
+host address specified by @var{h}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.27.
+@end table
+
+
+
+@node acc_hostptr
+@section @code{acc_hostptr} -- Get host pointer associated with specific device address.
+@table @asis
+@item @emph{Description}
+This function returns the host address that has been mapped to the
+device address specified by @var{d}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.28.
+@end table
+
+
+
+@node acc_is_present
+@section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
+@table @asis
+@item @emph{Description}
+This function indicates whether the specified host address in @var{a} and a
+length of @var{len} bytes is present on the device. In C/C++, a non-zero
+value is returned to indicate the presence of the mapped memory on the
+device. A zero is returned to indicate the memory is not mapped on the
+device.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes. If the host
+memory is mapped to device memory, then a @code{true} is returned. Otherwise,
+a @code{false} is return to indicate the mapped memory is not present.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_is_present(a)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{logical acc_is_present}
+@item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
+@item @tab @code{type, dimension(:[,:]...) :: a}
+@item @tab @code{integer len}
+@item @tab @code{logical acc_is_present}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.29.
+@end table
+
+
+
+@node acc_memcpy_to_device
+@section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
+@table @asis
+@item @emph{Description}
+This function copies host memory specified by host address of @var{src} to
+device memory specified by the device address @var{dest} for a length of
+@var{bytes} bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.30.
+@end table
+
+
+
+@node acc_memcpy_from_device
+@section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function copies host memory specified by host address of @var{src} from
+device memory specified by the device address @var{dest} for a length of
+@var{bytes} bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.31.
+@end table
+
+
+
+@node acc_get_current_cuda_device
+@section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA device handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+A.2.1.1.
+@end table
+
+
+
+@node acc_get_current_cuda_context
+@section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA context handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+A.2.1.2.
+@end table
+
+
+
+@node acc_get_cuda_stream
+@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA stream handle for the queue @var{async}.
+This handle is the same as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+A.2.1.3.
+@end table
+
+
+
+@node acc_set_cuda_stream
+@section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
+@table @asis
+@item @emph{Description}
+This function associates the stream handle specified by @var{stream} with
+the queue @var{async}.
+
+This cannot be used to change the stream handle associated with
+@code{acc_async_sync}.
+
+The return value is not specified.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+A.2.1.4.
+@end table
+
+
+
+@node acc_prof_register
+@section @code{acc_prof_register} -- Register callbacks.
+@table @asis
+@item @emph{Description}:
+This function registers callbacks.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
+@end multitable
+
+@item @emph{See also}:
+@ref{OpenACC Profiling Interface}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+5.3.
+@end table
+
+
+
+@node acc_prof_unregister
+@section @code{acc_prof_unregister} -- Unregister callbacks.
+@table @asis
+@item @emph{Description}:
+This function unregisters callbacks.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
+@end multitable
+
+@item @emph{See also}:
+@ref{OpenACC Profiling Interface}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+5.3.
+@end table
+
+
+
+@node acc_prof_lookup
+@section @code{acc_prof_lookup} -- Obtain inquiry functions.
+@table @asis
+@item @emph{Description}:
+Function to obtain inquiry functions.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
+@end multitable
+
+@item @emph{See also}:
+@ref{OpenACC Profiling Interface}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+5.3.
+@end table
+
+
+
+@node acc_register_library
+@section @code{acc_register_library} -- Library registration.
+@table @asis
+@item @emph{Description}:
+Function for library registration.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
+@end multitable
+
+@item @emph{See also}:
+@ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+5.3.
+@end table
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Environment Variables
+@c ---------------------------------------------------------------------
+
+@node OpenACC Environment Variables
+@chapter OpenACC Environment Variables
+
+The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
+are defined by section 4 of the OpenACC specification in version 2.0.
+The variable @env{ACC_PROFLIB}
+is defined by section 4 of the OpenACC specification in version 2.6.
+The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
+
+@menu
+* ACC_DEVICE_TYPE::
+* ACC_DEVICE_NUM::
+* ACC_PROFLIB::
+* GCC_ACC_NOTIFY::
+@end menu
+
+
+
+@node ACC_DEVICE_TYPE
+@section @code{ACC_DEVICE_TYPE}
+@table @asis
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+4.1.
+@end table
+
+
+
+@node ACC_DEVICE_NUM
+@section @code{ACC_DEVICE_NUM}
+@table @asis
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+4.2.
+@end table
+
+
+
+@node ACC_PROFLIB
+@section @code{ACC_PROFLIB}
+@table @asis
+@item @emph{See also}:
+@ref{acc_register_library}, @ref{OpenACC Profiling Interface}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+4.3.
+@end table
+
+
+
+@node GCC_ACC_NOTIFY
+@section @code{GCC_ACC_NOTIFY}
+@table @asis
+@item @emph{Description}:
+Print debug information pertaining to the accelerator.
+@end table
+
+
+
+@c ---------------------------------------------------------------------
+@c CUDA Streams Usage
+@c ---------------------------------------------------------------------
+
+@node CUDA Streams Usage
+@chapter CUDA Streams Usage
+
+This applies to the @code{nvptx} plugin only.
+
+The library provides elements that perform asynchronous movement of
+data and asynchronous operation of computing constructs. This
+asynchronous functionality is implemented by making use of CUDA
+streams@footnote{See "Stream Management" in "CUDA Driver API",
+TRM-06703-001, Version 5.5, for additional information}.
+
+The primary means by that the asychronous functionality is accessed
+is through the use of those OpenACC directives which make use of the
+@code{async} and @code{wait} clauses. When the @code{async} clause is
+first used with a directive, it creates a CUDA stream. If an
+@code{async-argument} is used with the @code{async} clause, then the
+stream is associated with the specified @code{async-argument}.
+
+Following the creation of an association between a CUDA stream and the
+@code{async-argument} of an @code{async} clause, both the @code{wait}
+clause and the @code{wait} directive can be used. When either the
+clause or directive is used after stream creation, it creates a
+rendezvous point whereby execution waits until all operations
+associated with the @code{async-argument}, that is, stream, have
+completed.
+
+Normally, the management of the streams that are created as a result of
+using the @code{async} clause, is done without any intervention by the
+caller. This implies the association between the @code{async-argument}
+and the CUDA stream will be maintained for the lifetime of the program.
+However, this association can be changed through the use of the library
+function @code{acc_set_cuda_stream}. When the function
+@code{acc_set_cuda_stream} is called, the CUDA stream that was
+originally associated with the @code{async} clause will be destroyed.
+Caution should be taken when changing the association as subsequent
+references to the @code{async-argument} refer to a different
+CUDA stream.
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Library Interoperability
+@c ---------------------------------------------------------------------
+
+@node OpenACC Library Interoperability
+@chapter OpenACC Library Interoperability
+
+@section Introduction
+
+The OpenACC library uses the CUDA Driver API, and may interact with
+programs that use the Runtime library directly, or another library
+based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
+"Interactions with the CUDA Driver API" in
+"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
+Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
+for additional information on library interoperability.}.
+This chapter describes the use cases and what changes are
+required in order to use both the OpenACC library and the CUBLAS and Runtime
+libraries within a program.
+
+@section First invocation: NVIDIA CUBLAS library API
+
+In this first use case (see below), a function in the CUBLAS library is called
+prior to any of the functions in the OpenACC library. More specifically, the
+function @code{cublasCreate()}.
+
+When invoked, the function initializes the library and allocates the
+hardware resources on the host and the device on behalf of the caller. Once
+the initialization and allocation has completed, a handle is returned to the
+caller. The OpenACC library also requires initialization and allocation of
+hardware resources. Since the CUBLAS library has already allocated the
+hardware resources for the device, all that is left to do is to initialize
+the OpenACC library and acquire the hardware resources on the host.
+
+Prior to calling the OpenACC function that initializes the library and
+allocate the host hardware resources, you need to acquire the device number
+that was allocated during the call to @code{cublasCreate()}. The invoking of the
+runtime library function @code{cudaGetDevice()} accomplishes this. Once
+acquired, the device number is passed along with the device type as
+parameters to the OpenACC library function @code{acc_set_device_num()}.
+
+Once the call to @code{acc_set_device_num()} has completed, the OpenACC
+library uses the context that was created during the call to
+@code{cublasCreate()}. In other words, both libraries will be sharing the
+same context.
+
+@smallexample
+ /* Create the handle */
+ s = cublasCreate(&h);
+ if (s != CUBLAS_STATUS_SUCCESS)
+ @{
+ fprintf(stderr, "cublasCreate failed %d\n", s);
+ exit(EXIT_FAILURE);
+ @}
+
+ /* Get the device number */
+ e = cudaGetDevice(&dev);
+ if (e != cudaSuccess)
+ @{
+ fprintf(stderr, "cudaGetDevice failed %d\n", e);
+ exit(EXIT_FAILURE);
+ @}
+
+ /* Initialize OpenACC library and use device 'dev' */
+ acc_set_device_num(dev, acc_device_nvidia);
+
+@end smallexample
+@center Use Case 1
+
+@section First invocation: OpenACC library API
+
+In this second use case (see below), a function in the OpenACC library is
+called prior to any of the functions in the CUBLAS library. More specificially,
+the function @code{acc_set_device_num()}.
+
+In the use case presented here, the function @code{acc_set_device_num()}
+is used to both initialize the OpenACC library and allocate the hardware
+resources on the host and the device. In the call to the function, the
+call parameters specify which device to use and what device
+type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
+is but one method to initialize the OpenACC library and allocate the
+appropriate hardware resources. Other methods are available through the
+use of environment variables and these will be discussed in the next section.
+
+Once the call to @code{acc_set_device_num()} has completed, other OpenACC
+functions can be called as seen with multiple calls being made to
+@code{acc_copyin()}. In addition, calls can be made to functions in the
+CUBLAS library. In the use case a call to @code{cublasCreate()} is made
+subsequent to the calls to @code{acc_copyin()}.
+As seen in the previous use case, a call to @code{cublasCreate()}
+initializes the CUBLAS library and allocates the hardware resources on the
+host and the device. However, since the device has already been allocated,
+@code{cublasCreate()} will only initialize the CUBLAS library and allocate
+the appropriate hardware resources on the host. The context that was created
+as part of the OpenACC initialization is shared with the CUBLAS library,
+similarly to the first use case.
+
+@smallexample
+ dev = 0;
+
+ acc_set_device_num(dev, acc_device_nvidia);
+
+ /* Copy the first set to the device */
+ d_X = acc_copyin(&h_X[0], N * sizeof (float));
+ if (d_X == NULL)
+ @{
+ fprintf(stderr, "copyin error h_X\n");
+ exit(EXIT_FAILURE);
+ @}
+
+ /* Copy the second set to the device */
+ d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
+ if (d_Y == NULL)
+ @{
+ fprintf(stderr, "copyin error h_Y1\n");
+ exit(EXIT_FAILURE);
+ @}
+
+ /* Create the handle */
+ s = cublasCreate(&h);
+ if (s != CUBLAS_STATUS_SUCCESS)
+ @{
+ fprintf(stderr, "cublasCreate failed %d\n", s);
+ exit(EXIT_FAILURE);
+ @}
+
+ /* Perform saxpy using CUBLAS library function */
+ s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
+ if (s != CUBLAS_STATUS_SUCCESS)
+ @{
+ fprintf(stderr, "cublasSaxpy failed %d\n", s);
+ exit(EXIT_FAILURE);
+ @}
+
+ /* Copy the results from the device */
+ acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
+
+@end smallexample
+@center Use Case 2
+
+@section OpenACC library and environment variables
+
+There are two environment variables associated with the OpenACC library
+that may be used to control the device type and device number:
+@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respecively. These two
+environement variables can be used as an alternative to calling
+@code{acc_set_device_num()}. As seen in the second use case, the device
+type and device number were specified using @code{acc_set_device_num()}.
+If however, the aforementioned environment variables were set, then the
+call to @code{acc_set_device_num()} would not be required.
+
+
+The use of the environment variables is only relevant when an OpenACC function
+is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
+is called prior to a call to an OpenACC function, then you must call
+@code{acc_set_device_num()}@footnote{More complete information
+about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
+sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
+Application Programming Interfaceā€¯, Version 2.0.}
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Profiling Interface
+@c ---------------------------------------------------------------------
+
+@node OpenACC Profiling Interface
+@chapter OpenACC Profiling Interface
+
+@section Implementation Status and Implementation-Defined Behavior
+
+We're implementing the OpenACC Profiling Interface as defined by the
+OpenACC 2.6 specification. We're clarifying some aspects here as
+@emph{implementation-defined behavior}, while they're still under
+discussion within the OpenACC Technical Committee.
+
+This implementation is tuned to keep the performance impact as low as
+possible for the (very common) case that the Profiling Interface is
+not enabled. This is relevant, as the Profiling Interface affects all
+the @emph{hot} code paths (in the target code, not in the offloaded
+code). Users of the OpenACC Profiling Interface can be expected to
+understand that performance will be impacted to some degree once the
+Profiling Interface has gotten enabled: for example, because of the
+@emph{runtime} (libgomp) calling into a third-party @emph{library} for
+every event that has been registered.
+
+We're not yet accounting for the fact that @cite{OpenACC events may
+occur during event processing}.
+
+We're not yet implementing initialization via a
+@code{acc_register_library} function that is either statically linked
+in, or dynamically via @env{LD_PRELOAD}.
+Initialization via @code{acc_register_library} functions dynamically
+loaded via the @env{ACC_PROFLIB} environment variable does work, as
+does directly calling @code{acc_prof_register},
+@code{acc_prof_unregister}, @code{acc_prof_lookup}.
+
+As currently there are no inquiry functions defined, calls to
+@code{acc_prof_lookup} will always return @code{NULL}.
+
+There aren't separate @emph{start}, @emph{stop} events defined for the
+event types @code{acc_ev_create}, @code{acc_ev_delete},
+@code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these
+should be triggered before or after the actual device-specific call is
+made. We trigger them after.
+
+Remarks about data provided to callbacks:
+
+@table @asis
+
+@item @code{acc_prof_info.event_type}
+It's not clear if for @emph{nested} event callbacks (for example,
+@code{acc_ev_enqueue_launch_start} as part of a parent compute
+construct), this should be set for the nested event
+(@code{acc_ev_enqueue_launch_start}), or if the value of the parent
+construct should remain (@code{acc_ev_compute_construct_start}). In
+this implementation, the value will generally correspond to the
+innermost nested event type.
+
+@item @code{acc_prof_info.device_type}
+@itemize
+
+@item
+For @code{acc_ev_compute_construct_start}, and in presence of an
+@code{if} clause with @emph{false} argument, this will still refer to
+the offloading device type.
+It's not clear if that's the expected behavior.
+
+@item
+Complementary to the item before, for
+@code{acc_ev_compute_construct_end}, this is set to
+@code{acc_device_host} in presence of an @code{if} clause with
+@emph{false} argument.
+It's not clear if that's the expected behavior.
+
+@end itemize
+
+@item @code{acc_prof_info.thread_id}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_prof_info.async}
+@itemize
+
+@item
+Not yet implemented correctly for
+@code{acc_ev_compute_construct_start}.
+
+@item
+In a compute construct, for host-fallback
+execution/@code{acc_device_host} it will always be
+@code{acc_async_sync}.
+It's not clear if that's the expected behavior.
+
+@item
+For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
+it will always be @code{acc_async_sync}.
+It's not clear if that's the expected behavior.
+
+@end itemize
+
+@item @code{acc_prof_info.async_queue}
+There is no @cite{limited number of asynchronous queues} in libgomp.
+This will always have the same value as @code{acc_prof_info.async}.
+
+@item @code{acc_prof_info.src_file}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_prof_info.func_name}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_prof_info.line_no}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_prof_info.end_line_no}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_prof_info.func_line_no}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_prof_info.func_end_line_no}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
+Relating to @code{acc_prof_info.event_type} discussed above, in this
+implementation, this will always be the same value as
+@code{acc_prof_info.event_type}.
+
+@item @code{acc_event_info.*.parent_construct}
+@itemize
+
+@item
+Will be @code{acc_construct_parallel} for all OpenACC compute
+constructs as well as many OpenACC Runtime API calls; should be the
+one matching the actual construct, or
+@code{acc_construct_runtime_api}, respectively.
+
+@item
+Will be @code{acc_construct_enter_data} or
+@code{acc_construct_exit_data} when processing variable mappings
+specified in OpenACC @emph{declare} directives; should be
+@code{acc_construct_declare}.
+
+@item
+For implicit @code{acc_ev_device_init_start},
+@code{acc_ev_device_init_end}, and explicit as well as implicit
+@code{acc_ev_alloc}, @code{acc_ev_free},
+@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
+@code{acc_ev_enqueue_download_start}, and
+@code{acc_ev_enqueue_download_end}, will be
+@code{acc_construct_parallel}; should reflect the real parent
+construct.
+
+@end itemize
+
+@item @code{acc_event_info.*.implicit}
+For @code{acc_ev_alloc}, @code{acc_ev_free},
+@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
+@code{acc_ev_enqueue_download_start}, and
+@code{acc_ev_enqueue_download_end}, this currently will be @code{1}
+also for explicit usage.
+
+@item @code{acc_event_info.data_event.var_name}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_event_info.data_event.host_ptr}
+For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
+@code{NULL}.
+
+@item @code{typedef union acc_api_info}
+@dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
+Information}. This should obviously be @code{typedef @emph{struct}
+acc_api_info}.
+
+@item @code{acc_api_info.device_api}
+Possibly not yet implemented correctly for
+@code{acc_ev_compute_construct_start},
+@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
+will always be @code{acc_device_api_none} for these event types.
+For @code{acc_ev_enter_data_start}, it will be
+@code{acc_device_api_none} in some cases.
+
+@item @code{acc_api_info.device_type}
+Always the same as @code{acc_prof_info.device_type}.
+
+@item @code{acc_api_info.vendor}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_api_info.device_handle}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_api_info.context_handle}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_api_info.async_handle}
+Always @code{NULL}; not yet implemented.
+
+@end table
+
+Remarks about certain event types:
+
+@table @asis
+
+@item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
+@itemize
+
+@item
+@c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
+@c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
+@c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
+Whan a compute construct triggers implicit
+@code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
+events, they currently aren't @emph{nested within} the corresponding
+@code{acc_ev_compute_construct_start} and
+@code{acc_ev_compute_construct_end}, but they're currently observed
+@emph{before} @code{acc_ev_compute_construct_start}.
+It's not clear what to do: the standard asks us provide a lot of
+details to the @code{acc_ev_compute_construct_start} callback, without
+(implicitly) initializing a device before?
+
+@item
+Callbacks for these event types will not be invoked for calls to the
+@code{acc_set_device_type} and @code{acc_set_device_num} functions.
+It's not clear if they should be.
+
+@end itemize
+
+@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
+@itemize
+
+@item
+Callbacks for these event types will also be invoked for OpenACC
+@emph{host_data} constructs.
+It's not clear if they should be.
+
+@item
+Callbacks for these event types will also be invoked when processing
+variable mappings specified in OpenACC @emph{declare} directives.
+It's not clear if they should be.
+
+@end itemize
+
+@end table
+
+Callbacks for the following event types will be invoked, but dispatch
+and information provided therein has not yet been thoroughly reviewed:
+
+@itemize
+@item @code{acc_ev_alloc}
+@item @code{acc_ev_free}
+@item @code{acc_ev_update_start}, @code{acc_ev_update_end}
+@item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
+@item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
+@end itemize
+
+During device initialization, and finalization, respectively,
+callbacks for the following event types will not yet be invoked:
+
+@itemize
+@item @code{acc_ev_alloc}
+@item @code{acc_ev_free}
+@end itemize
+
+Callbacks for the following event types have not yet been implemented,
+so currently won't be invoked:
+
+@itemize
+@item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
+@item @code{acc_ev_runtime_shutdown}
+@item @code{acc_ev_create}, @code{acc_ev_delete}
+@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
+@end itemize
+
+For the following runtime library functions, not all expected
+callbacks will be invoked (mostly concerning implicit device
+initialization):
+
+@itemize
+@item @code{acc_get_num_devices}
+@item @code{acc_set_device_type}
+@item @code{acc_get_device_type}
+@item @code{acc_set_device_num}
+@item @code{acc_get_device_num}
+@item @code{acc_init}
+@item @code{acc_shutdown}
+@end itemize
+
+Aside from implicit device initialization, for the following runtime
+library functions, no callbacks will be invoked for shared-memory
+offloading devices (it's not clear if they should be):
+
+@itemize
+@item @code{acc_malloc}
+@item @code{acc_free}
+@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
+@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
+@item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
+@item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
+@item @code{acc_update_device}, @code{acc_update_device_async}
+@item @code{acc_update_self}, @code{acc_update_self_async}
+@item @code{acc_map_data}, @code{acc_unmap_data}
+@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
+@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
+@end itemize
+
+
+
+@c ---------------------------------------------------------------------
+@c The libgomp ABI
+@c ---------------------------------------------------------------------
+
+@node The libgomp ABI
+@chapter The libgomp ABI
+
+The following sections present notes on the external ABI as
+presented by libgomp. Only maintainers should need them.
+
+@menu
+* Implementing MASTER construct::
+* Implementing CRITICAL construct::
+* Implementing ATOMIC construct::
+* Implementing FLUSH construct::
+* Implementing BARRIER construct::
+* Implementing THREADPRIVATE construct::
+* Implementing PRIVATE clause::
+* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
+* Implementing REDUCTION clause::
+* Implementing PARALLEL construct::
+* Implementing FOR construct::
+* Implementing ORDERED construct::
+* Implementing SECTIONS construct::
+* Implementing SINGLE construct::
+* Implementing OpenACC's PARALLEL construct::
+@end menu
+
+
+@node Implementing MASTER construct
+@section Implementing MASTER construct
+
+@smallexample
+if (omp_get_thread_num () == 0)
+ block
+@end smallexample
+
+Alternately, we generate two copies of the parallel subfunction
+and only include this in the version run by the master thread.
+Surely this is not worthwhile though...
+
+
+
+@node Implementing CRITICAL construct
+@section Implementing CRITICAL construct
+
+Without a specified name,
+
+@smallexample
+ void GOMP_critical_start (void);
+ void GOMP_critical_end (void);
+@end smallexample
+
+so that we don't get COPY relocations from libgomp to the main
+application.
+
+With a specified name, use omp_set_lock and omp_unset_lock with
+name being transformed into a variable declared like
+
+@smallexample
+ omp_lock_t gomp_critical_user_<name> __attribute__((common))
+@end smallexample
+
+Ideally the ABI would specify that all zero is a valid unlocked
+state, and so we wouldn't need to initialize this at
+startup.
+
+
+
+@node Implementing ATOMIC construct
+@section Implementing ATOMIC construct
+
+The target should implement the @code{__sync} builtins.
+
+Failing that we could add
+
+@smallexample
+ void GOMP_atomic_enter (void)
+ void GOMP_atomic_exit (void)
+@end smallexample
+
+which reuses the regular lock code, but with yet another lock
+object private to the library.
+
+
+
+@node Implementing FLUSH construct
+@section Implementing FLUSH construct
+
+Expands to the @code{__sync_synchronize} builtin.
+
+
+
+@node Implementing BARRIER construct
+@section Implementing BARRIER construct
+
+@smallexample
+ void GOMP_barrier (void)
+@end smallexample
+
+
+@node Implementing THREADPRIVATE construct
+@section Implementing THREADPRIVATE construct
+
+In _most_ cases we can map this directly to @code{__thread}. Except
+that OMP allows constructors for C++ objects. We can either
+refuse to support this (how often is it used?) or we can
+implement something akin to .ctors.
+
+Even more ideally, this ctor feature is handled by extensions
+to the main pthreads library. Failing that, we can have a set
+of entry points to register ctor functions to be called.
+
+
+
+@node Implementing PRIVATE clause
+@section Implementing PRIVATE clause
+
+In association with a PARALLEL, or within the lexical extent
+of a PARALLEL block, the variable becomes a local variable in
+the parallel subfunction.
+
+In association with FOR or SECTIONS blocks, create a new
+automatic variable within the current function. This preserves
+the semantic of new variable creation.
+
+
+
+@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
@section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
This seems simple enough for PARALLEL blocks. Create a private
+@node Implementing OpenACC's PARALLEL construct
+@section Implementing OpenACC's PARALLEL construct
+
+@smallexample
+ void GOACC_parallel ()
+@end smallexample
+
+
+
@c ---------------------------------------------------------------------
-@c
+@c Reporting Bugs
@c ---------------------------------------------------------------------
@node Reporting Bugs
@chapter Reporting Bugs
-Bugs in the GNU OpenMP implementation should be reported via
-@uref{http://gcc.gnu.org/bugzilla/, Bugzilla}. For all cases, please add
-"openmp" to the keywords field in the bug report.
+Bugs in the GNU Offloading and Multi Processing Runtime Library should
+be reported via @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
+"openacc", or "openmp", or both to the keywords field in the bug
+report, as appropriate.