Add a pass_by_reference flag to function_arg_info

[thirdparty/gcc.git] / libgomp / libgomp.texi
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi

index 8461c5b0c7ce0bdcde52ccc9a52d762f7129e26e..6db895f627266e45cb376f1848dec1ae71f373c5 100644 (file)
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -7,7 +7,7 @@
  
  
  @copying
-Copyright @copyright{} 2006-2014 Free Software Foundation, Inc.
+Copyright @copyright{} 2006-2019 Free Software Foundation, Inc.
  
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.3 or
@@ -31,11 +31,13 @@ texts being (a) (see below), and with the Back-Cover Texts being (b)
  @ifinfo
  @dircategory GNU Libraries
  @direntry
-* libgomp: (libgomp).                    GNU OpenMP runtime library
+* libgomp: (libgomp).          GNU Offloading and Multi Processing Runtime Library.
  @end direntry
  
-This manual documents the GNU implementation of the OpenMP API for 
-multi-platform shared-memory parallel programming in C/C++ and Fortran.
+This manual documents libgomp, the GNU Offloading and Multi Processing
+Runtime library.  This is the GNU implementation of the OpenMP and
+OpenACC APIs for parallel and accelerator programming in C/C++ and
+Fortran.
  
  Published by the Free Software Foundation
  51 Franklin Street, Fifth Floor
@@ -48,7 +50,8 @@ Boston, MA 02110-1301 USA
  @setchapternewpage odd
  
  @titlepage
-@title The GNU OpenMP Implementation
+@title GNU Offloading and Multi Processing Runtime Library
+@subtitle The GNU OpenMP and OpenACC Implementation
  @page
  @vskip 0pt plus 1filll
  @comment For the @value{version-GCC} Version*
@@ -69,9 +72,19 @@ Boston, MA 02110-1301, USA@*
  @top Introduction
  @cindex Introduction
  
-This manual documents the usage of libgomp, the GNU implementation of the 
-@uref{http://www.openmp.org, OpenMP} Application Programming Interface (API)
-for multi-platform shared-memory parallel programming in C/C++ and Fortran.
+This manual documents the usage of libgomp, the GNU Offloading and
+Multi Processing Runtime Library.  This includes the GNU
+implementation of the @uref{https://www.openmp.org, OpenMP} Application
+Programming Interface (API) for multi-platform shared-memory parallel
+programming in C/C++ and Fortran, and the GNU implementation of the
+@uref{https://www.openacc.org, OpenACC} Application Programming
+Interface (API) for offloading of code to accelerator devices in C/C++
+and Fortran.
+
+Originally, libgomp implemented the GNU OpenMP Runtime Library.  Based
+on this, support for OpenACC and offloading (both OpenACC and OpenMP
+4's target construct) has been added later on, and the library's name
+changed to GNU Offloading and Multi Processing Runtime Library.
  
  
  
@@ -82,12 +95,26 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran.
  @comment
  @menu
  * Enabling OpenMP::            How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+                               The OpenMP runtime application programming
                                 interface.
-* Environment Variables::      Influencing runtime behavior with environment 
-                               variables.
+* OpenMP Environment Variables: Environment Variables.
+                               Influencing OpenMP runtime behavior with
+                               environment variables.
+* Enabling OpenACC::           How to enable OpenACC for your
+                               applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+                               programming interface.
+* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
+                               environment variables.
+* CUDA Streams Usage::         Notes on the implementation of
+                               asynchronous operations.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+                               NVIDIA CUBLAS library.
+* OpenACC Profiling Interface::
  * The libgomp ABI::            Notes on the external ABI presented by libgomp.
-* Reporting Bugs::             How to report bugs in GNU OpenMP.
+* Reporting Bugs::             How to report bugs in the GNU Offloading and
+                               Multi Processing Runtime Library.
  * Copying::                    GNU general public license says
                                 how you can copy and share libgomp.
  * GNU Free Documentation License::
@@ -115,25 +142,25 @@ arranges for automatic linking of the OpenMP runtime library
  (@ref{Runtime Library Routines}).
  
  A complete description of all OpenMP directives accepted may be found in 
-the @uref{http://www.openmp.org, OpenMP Application Program Interface} manual,
-version 4.0.
+the @uref{https://www.openmp.org, OpenMP Application Program Interface} manual,
+version 4.5.
  
  
  @c ---------------------------------------------------------------------
-@c Runtime Library Routines
+@c OpenMP Runtime Library Routines
  @c ---------------------------------------------------------------------
  
  @node Runtime Library Routines
-@chapter Runtime Library Routines
+@chapter OpenMP Runtime Library Routines
  
  The runtime routines described here are defined by Section 3 of the OpenMP
-specification in version 4.0.  The routines are structured in following
+specification in version 4.5.  The routines are structured in following
  three parts:
  
+@menu
  Control threads, processors and the parallel environment.  They have C
  linkage, and do not throw exceptions.
  
-@menu
  * omp_get_active_level::        Number of active parallel regions
  * omp_get_ancestor_thread_num:: Ancestor thread ID
  * omp_get_cancellation::        Whether cancellation support is enabled
@@ -141,6 +168,7 @@ linkage, and do not throw exceptions.
  * omp_get_dynamic::             Dynamic teams setting
  * omp_get_level::               Number of parallel regions
  * omp_get_max_active_levels::   Maximum number of active regions
+* omp_get_max_task_priority::   Maximum task priority value that can be set
  * omp_get_max_threads::         Maximum number of threads of parallel region
  * omp_get_nested::              Nested parallel regions
  * omp_get_num_devices::         Number of target devices
@@ -162,11 +190,9 @@ linkage, and do not throw exceptions.
  * omp_set_nested::              Enable/disable nested parallel regions
  * omp_set_num_threads::         Set upper team size limit
  * omp_set_schedule::            Set the runtime scheduling method
-@end menu
  
  Initialize, set, test, unset and destroy simple and nested locks.
  
-@menu
  * omp_init_lock::            Initialize simple lock
  * omp_set_lock::             Wait for and set simple lock
  * omp_test_lock::            Test and set simple lock if available
@@ -177,11 +203,9 @@ Initialize, set, test, unset and destroy simple and nested locks.
  * omp_test_nest_lock::       Test and set nested lock if available
  * omp_unset_nest_lock::      Unset nested lock
  * omp_destroy_nest_lock::    Destroy nested lock
-@end menu
  
  Portable, thread-based, wall clock timer.
  
-@menu
  * omp_get_wtick::            Get timer precision.
  * omp_get_wtime::            Elapsed wall clock time.
  @end menu
@@ -209,7 +233,7 @@ which enclose the calling call.
  @ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.20.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
  @end table
  
  
@@ -238,7 +262,7 @@ zero to @code{omp_get_level} -1 is returned; if @var{level} is
  @ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.18.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
  @end table
  
  
@@ -266,7 +290,7 @@ deactivated.
  @ref{OMP_CANCELLATION}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.9.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
  @end table
  
  
@@ -291,7 +315,7 @@ Get the default device for target regions without device clause.
  @ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.24.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
  @end table
  
  
@@ -323,7 +347,7 @@ disabled by default.
  @ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.8.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
  @end table
  
  
@@ -349,7 +373,7 @@ which enclose the calling call.
  @ref{omp_get_active_level}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.17.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
  @end table
  
  
@@ -374,10 +398,31 @@ This function obtains the maximum allowed number of nested, active parallel regi
  @ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.16.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
  @end table
  
  
+@node omp_get_max_task_priority
+@section @code{omp_get_max_task_priority} -- Maximum priority value
+that can be set for tasks.
+@table @asis
+@item @emph{Description}:
+This function obtains the maximum allowed priority number for tasks.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
+@end table
+
  
  @node omp_get_max_threads
  @section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
@@ -400,7 +445,7 @@ that does not use the clause @code{num_threads}.
  @ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.3.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
  @end table
  
  
@@ -432,7 +477,7 @@ disabled by default.
  @ref{omp_set_nested}, @ref{OMP_NESTED}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.11.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
  @end table
  
  
@@ -454,7 +499,7 @@ Returns the number of target devices.
  @end multitable
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.25.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
  @end table
  
  
@@ -476,7 +521,7 @@ Returns the number of processors online on that device.
  @end multitable
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.5.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
  @end table
  
  
@@ -498,7 +543,7 @@ Returns the number of teams in the current team region.
  @end multitable
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.26.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
  @end table
  
  
@@ -531,7 +576,7 @@ one thread per CPU online is used.
  @ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.2.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
  @end table
  
  
@@ -559,7 +604,7 @@ set via @env{OMP_PROC_BIND}.  Possible values are @code{omp_proc_bind_false},
  @ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.22.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
  @end table
  
  
@@ -571,25 +616,25 @@ set via @env{OMP_PROC_BIND}.  Possible values are @code{omp_proc_bind_false},
  Obtain the runtime scheduling method.  The @var{kind} argument will be
  set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
  @code{omp_sched_guided} or @code{omp_sched_auto}.  The second argument,
-@var{modifier}, is set to the chunk size.
+@var{chunk_size}, is set to the chunk size.
  
  @item @emph{C/C++}
  @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *modifier);}
+@item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
  @end multitable
  
  @item @emph{Fortran}:
  @multitable @columnfractions .20 .80
-@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, modifier)}
+@item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
  @item                   @tab @code{integer(kind=omp_sched_kind) kind}
-@item                   @tab @code{integer modifier}
+@item                   @tab @code{integer chunk_size}
  @end multitable
  
  @item @emph{See also}:
  @ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.13.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
  @end table
  
  
@@ -611,7 +656,7 @@ Returns the team number of the calling thread.
  @end multitable
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.27.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
  @end table
  
  
@@ -641,7 +686,7 @@ to @code{omp_get_num_threads}.
  @ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.19.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
  @end table
  
  
@@ -666,7 +711,7 @@ Return the maximum number of threads of the program.
  @ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.14.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
  @end table
  
  
@@ -695,7 +740,7 @@ value of the master thread of a team is always 0.
  @ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.4.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
  @end table
  
  
@@ -719,7 +764,7 @@ their language-specific counterparts.
  @end multitable
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.6.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
  @end table
  
  
@@ -742,7 +787,7 @@ and @code{false} represent their language-specific counterparts.
  @end multitable
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.21.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
  @end table
  
  
@@ -766,7 +811,7 @@ their language-specific counterparts.
  @end multitable
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.28.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
  @end table
  
  
@@ -793,7 +838,7 @@ shall be a nonnegative device number.
  @ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.23.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
  @end table
  
  
@@ -822,7 +867,7 @@ adjustment of team sizes and @code{false} disables it.
  @ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.7.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
  @end table
  
  
@@ -849,7 +894,7 @@ parallel regions.
  @ref{omp_get_max_active_levels}, @ref{omp_get_active_level}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.15.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
  @end table
  
  
@@ -878,7 +923,7 @@ dynamic adjustment of team sizes and @code{false} disables it.
  @ref{OMP_NESTED}, @ref{omp_get_nested}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.10.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
  @end table
  
  
@@ -906,7 +951,7 @@ argument of @code{omp_set_num_threads} shall be a positive integer.
  @ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.1.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
  @end table
  
  
@@ -919,19 +964,19 @@ Sets the runtime scheduling method.  The @var{kind} argument can have the
  value @code{omp_sched_static}, @code{omp_sched_dynamic},
  @code{omp_sched_guided} or @code{omp_sched_auto}.  Except for
  @code{omp_sched_auto}, the chunk size is set to the value of
-@var{modifier} if positive, or to the default value if zero or negative.
-For @code{omp_sched_auto} the @var{modifier} argument is ignored.
+@var{chunk_size} if positive, or to the default value if zero or negative.
+For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
  
  @item @emph{C/C++}
  @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int modifier);}
+@item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
  @end multitable
  
  @item @emph{Fortran}:
  @multitable @columnfractions .20 .80
-@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, modifier)}
+@item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
  @item                   @tab @code{integer(kind=omp_sched_kind) kind}
-@item                   @tab @code{integer modifier}
+@item                   @tab @code{integer chunk_size}
  @end multitable
  
  @item @emph{See also}:
@@ -939,7 +984,7 @@ For @code{omp_sched_auto} the @var{modifier} argument is ignored.
  @ref{OMP_SCHEDULE}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.2.12.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
  @end table
  
  
@@ -966,7 +1011,7 @@ an unlocked state.
  @ref{omp_destroy_lock}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.1.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
  @end table
  
  
@@ -995,7 +1040,7 @@ a deadlock occurs.
  @ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.3.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
  @end table
  
  
@@ -1025,7 +1070,7 @@ does not block if the lock is not available.  This function returns
  @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.5.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
  @end table
  
  
@@ -1055,7 +1100,7 @@ again, set the lock to itself.
  @ref{omp_set_lock}, @ref{omp_test_lock}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.4.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
  @end table
  
  
@@ -1082,7 +1127,7 @@ in the unlocked state.
  @ref{omp_init_lock}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.2.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
  @end table
  
  
@@ -1109,7 +1154,7 @@ an unlocked state and the nesting count is set to zero.
  @ref{omp_destroy_nest_lock}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.1.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
  @end table
  
  
@@ -1137,7 +1182,7 @@ nesting count for the lock is incremented.
  @ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.3.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
  @end table
  
  
@@ -1168,7 +1213,7 @@ is returned.  Otherwise, the return value equals zero.
  @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.5.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
  @end table
  
  
@@ -1198,7 +1243,7 @@ one of them is chosen to, again, set the lock to itself.
  @ref{omp_set_nest_lock}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.4.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
  @end table
  
  
@@ -1225,7 +1270,7 @@ in the unlocked state and its nesting count must equal zero.
  @ref{omp_init_lock}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.3.2.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
  @end table
  
  
@@ -1251,7 +1296,7 @@ successive clock ticks.
  @ref{omp_get_wtime}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.4.2.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
  @end table
  
  
@@ -1279,39 +1324,42 @@ guaranteed not to change during the execution of the program.
  @ref{omp_get_wtick}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 3.4.1.
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
  @end table
  
  
  
  @c ---------------------------------------------------------------------
-@c Environment Variables
+@c OpenMP Environment Variables
  @c ---------------------------------------------------------------------
  
  @node Environment Variables
-@chapter Environment Variables
+@chapter OpenMP Environment Variables
  
  The environment variables which beginning with @env{OMP_} are defined by
-section 4 of the OpenMP specification in version 4.0, while those
+section 4 of the OpenMP specification in version 4.5, while those
  beginning with @env{GOMP_} are GNU extensions.
  
  @menu
-* OMP_CANCELLATION::      Set whether cancellation is activated
-* OMP_DISPLAY_ENV::       Show OpenMP version and environment variables
-* OMP_DEFAULT_DEVICE::    Set the device used in target regions
-* OMP_DYNAMIC::           Dynamic adjustment of threads
-* OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
-* OMP_NESTED::            Nested parallel regions
-* OMP_NUM_THREADS::       Specifies the number of threads to use
-* OMP_PROC_BIND::         Whether theads may be moved between CPUs
-* OMP_PLACES::            Specifies on which CPUs the theads should be placed
-* OMP_STACKSIZE::         Set default thread stack size
-* OMP_SCHEDULE::          How threads are scheduled
-* OMP_THREAD_LIMIT::      Set the maximum number of threads
-* OMP_WAIT_POLICY::       How waiting threads are handled
-* GOMP_CPU_AFFINITY::     Bind threads to specific CPUs
-* GOMP_STACKSIZE::        Set default thread stack size
-* GOMP_SPINCOUNT::        Set the busy-wait spin count
+* OMP_CANCELLATION::        Set whether cancellation is activated
+* OMP_DISPLAY_ENV::         Show OpenMP version and environment variables
+* OMP_DEFAULT_DEVICE::      Set the device used in target regions
+* OMP_DYNAMIC::             Dynamic adjustment of threads
+* OMP_MAX_ACTIVE_LEVELS::   Set the maximum number of nested parallel regions
+* OMP_MAX_TASK_PRIORITY::   Set the maximum task priority value
+* OMP_NESTED::              Nested parallel regions
+* OMP_NUM_THREADS::         Specifies the number of threads to use
+* OMP_PROC_BIND::           Whether theads may be moved between CPUs
+* OMP_PLACES::              Specifies on which CPUs the theads should be placed
+* OMP_STACKSIZE::           Set default thread stack size
+* OMP_SCHEDULE::            How threads are scheduled
+* OMP_THREAD_LIMIT::        Set the maximum number of threads
+* OMP_WAIT_POLICY::         How waiting threads are handled
+* GOMP_CPU_AFFINITY::       Bind threads to specific CPUs
+* GOMP_DEBUG::              Enable debugging output
+* GOMP_STACKSIZE::          Set default thread stack size
+* GOMP_SPINCOUNT::          Set the busy-wait spin count
+* GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
  @end menu
  
  
@@ -1327,7 +1375,7 @@ if unset, cancellation is disabled and the @code{cancel} construct is ignored.
  @ref{omp_get_cancellation}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.11
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
  @end table
  
  
@@ -1345,7 +1393,7 @@ this information will not be shown.
  
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.12
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
  @end table
  
  
@@ -1366,7 +1414,7 @@ device number 0 will be used.
  @ref{omp_get_default_device}, @ref{omp_set_default_device},
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.11
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
  @end table
  
  
@@ -1385,7 +1433,7 @@ disabled by default.
  @ref{omp_set_dynamic}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.3
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
  @end table
  
  
@@ -1403,7 +1451,27 @@ If undefined, the number of active levels is unlimited.
  @ref{omp_set_max_active_levels}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.9
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
+@end table
+
+
+
+@node OMP_MAX_TASK_PRIORITY
+@section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
+number that can be set for a task.
+@cindex Environment Variable
+@table @asis
+@item @emph{Description}:
+Specifies the initial value for the maximum priority value that can be
+set for a task.  The value of this variable shall be a non-negative
+integer, and zero is allowed.  If undefined, the default priority is
+0.
+
+@item @emph{See also}:
+@ref{omp_get_max_task_priority}
+
+@item @emph{Reference}: 
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
  @end table
  
  
@@ -1423,7 +1491,7 @@ regions are disabled by default.
  @ref{omp_set_nested}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.6
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
  @end table
  
  
@@ -1443,7 +1511,34 @@ level.  If undefined one thread per CPU is used.
  @ref{omp_set_num_threads}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.2
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
+@end table
+
+
+
+@node OMP_PROC_BIND
+@section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
+@cindex Environment Variable
+@table @asis
+@item @emph{Description}:
+Specifies whether threads may be moved between processors.  If set to
+@code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
+they may be moved.  Alternatively, a comma separated list with the
+values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
+the thread affinity policy for the corresponding nesting level.  With
+@code{MASTER} the worker threads are in the same place partition as the
+master thread.  With @code{CLOSE} those are kept close to the master thread
+in contiguous place partitions.  And with @code{SPREAD} a sparse distribution
+across the place partitions is used.
+
+When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
+@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
+
+@item @emph{See also}:
+@ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
  @end table
  
  
@@ -1485,34 +1580,27 @@ between CPUs following no placement policy.
  @ref{OMP_DISPLAY_ENV}
  
  @item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.5
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
  @end table
  
  
  
-@node OMP_PROC_BIND
-@section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
+@node OMP_STACKSIZE
+@section @env{OMP_STACKSIZE} -- Set default thread stack size
  @cindex Environment Variable
  @table @asis
  @item @emph{Description}:
-Specifies whether threads may be moved between processors.  If set to
-@code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
-they may be moved.  Alternatively, a comma separated list with the
-values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
-the thread affinity policy for the corresponding nesting level.  With
-@code{MASTER} the worker threads are in the same place partition as the
-master thread.  With @code{CLOSE} those are kept close to the master thread
-in contiguous place partitions.  And with @code{SPREAD} a sparse distribution
-across the place partitions is used.
-
-When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
-@env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
-
-@item @emph{See also}:
-@ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind}
+Set the default thread stack size in kilobytes, unless the number
+is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
+case the size is, respectively, in bytes, kilobytes, megabytes
+or gigabytes.  This is different from @code{pthread_attr_setstacksize}
+which gets the number of bytes as an argument.  If the stack size cannot
+be set due to system constraints, an error is reported and the initial
+stack size is left unchanged.  If undefined, the stack size is system
+dependent.
  
-@item @emph{Reference}:
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.4
+@item @emph{Reference}: 
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
  @end table
  
  
@@ -1533,27 +1621,7 @@ dynamic scheduling and a chunk size of 1 is used.
  @ref{omp_set_schedule}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Sections 2.7.1 and 4.1
-@end table
-
-
-
-@node OMP_STACKSIZE
-@section @env{OMP_STACKSIZE} -- Set default thread stack size
-@cindex Environment Variable
-@table @asis
-@item @emph{Description}:
-Set the default thread stack size in kilobytes, unless the number
-is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
-case the size is, respectively, in bytes, kilobytes, megabytes
-or gigabytes.  This is different from @code{pthread_attr_setstacksize}
-which gets the number of bytes as an argument.  If the stack size cannot
-be set due to system constraints, an error is reported and the initial
-stack size is left unchanged.  If undefined, the stack size is system
-dependent.
-
-@item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.7
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
  @end table
  
  
@@ -1571,7 +1639,7 @@ the number of threads is not limited.
  @ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.10
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
  @end table
  
  
@@ -1591,7 +1659,7 @@ before waiting passively.
  @ref{GOMP_SPINCOUNT}
  
  @item @emph{Reference}: 
-@uref{http://www.openmp.org/, OpenMP specification v4.0}, Section 4.8
+@uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
  @end table
  
  
@@ -1611,7 +1679,7 @@ CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
  and 14 respectively and then start assigning back from the beginning of
  the list.  @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
  
-There is no GNU OpenMP library routine to determine whether a CPU affinity 
+There is no libgomp library routine to determine whether a CPU affinity
  specification is in effect.  As a workaround, language-specific library 
  functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in 
  Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY} 
@@ -1629,6 +1697,20 @@ If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
  
  
  
+@node GOMP_DEBUG
+@section @env{GOMP_DEBUG} -- Enable debugging output
+@cindex Environment Variable
+@table @asis
+@item @emph{Description}:
+Enable debugging output.  The variable should be set to @code{0}
+(disabled, also the default if not set), or @code{1} (enabled).
+
+If enabled, some debugging output will be printed during execution.
+This is currently not specified in more detail, and subject to change.
+@end table
+
+
+
  @node GOMP_STACKSIZE
  @section @env{GOMP_STACKSIZE} -- Set default thread stack size
  @cindex Environment Variable
@@ -1680,134 +1762,1862 @@ or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
  
  
  
+@node GOMP_RTEMS_THREAD_POOLS
+@section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
+@cindex Environment Variable
+@cindex Implementation specific setting
+@table @asis
+@item @emph{Description}:
+This environment variable is only used on the RTEMS real-time operating system.
+It determines the scheduler instance specific thread pools.  The format for
+@env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
+@code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
+separated by @code{:} where:
+@itemize @bullet
+@item @code{<thread-pool-count>} is the thread pool count for this scheduler
+instance.
+@item @code{$<priority>} is an optional priority for the worker threads of a
+thread pool according to @code{pthread_setschedparam}.  In case a priority
+value is omitted, then a worker thread will inherit the priority of the OpenMP
+master thread that created it.  The priority of the worker thread is not
+changed after creation, even if a new OpenMP master thread using the worker has
+a different priority.
+@item @code{@@<scheduler-name>} is the scheduler instance name according to the
+RTEMS application configuration.
+@end itemize
+In case no thread pool configuration is specified for a scheduler instance,
+then each OpenMP master thread of this scheduler instance will use its own
+dynamically allocated thread pool.  To limit the worker thread count of the
+thread pools, each OpenMP master thread must call @code{omp_set_num_threads}.
+@item @emph{Example}:
+Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
+@code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
+@code{"1@@WRK0:3$4@@WRK1"}.  Then there are no thread pool restrictions for
+scheduler instance @code{IO}.  In the scheduler instance @code{WRK0} there is
+one thread pool available.  Since no priority is specified for this scheduler
+instance, the worker thread inherits the priority of the OpenMP master thread
+that created it.  In the scheduler instance @code{WRK1} there are three thread
+pools available and their worker threads run at priority four.
+@end table
+
+
+
  @c ---------------------------------------------------------------------
-@c The libgomp ABI
+@c Enabling OpenACC
  @c ---------------------------------------------------------------------
  
-@node The libgomp ABI
-@chapter The libgomp ABI
+@node Enabling OpenACC
+@chapter Enabling OpenACC
  
-The following sections present notes on the external ABI as 
-presented by libgomp.  Only maintainers should need them.
+To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
+flag @option{-fopenacc} must be specified.  This enables the OpenACC directive
+@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
+@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
+@code{!$} conditional compilation sentinels in free form and @code{c$},
+@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
+arranges for automatic linking of the OpenACC runtime library 
+(@ref{OpenACC Runtime Library Routines}).
  
-@menu
-* Implementing MASTER construct::
-* Implementing CRITICAL construct::
-* Implementing ATOMIC construct::
-* Implementing FLUSH construct::
-* Implementing BARRIER construct::
-* Implementing THREADPRIVATE construct::
-* Implementing PRIVATE clause::
-* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
-* Implementing REDUCTION clause::
-* Implementing PARALLEL construct::
-* Implementing FOR construct::
-* Implementing ORDERED construct::
-* Implementing SECTIONS construct::
-* Implementing SINGLE construct::
-@end menu
+A complete description of all OpenACC directives accepted may be found in 
+the @uref{https://www.openacc.org, OpenACC} Application Programming
+Interface manual, version 2.0.
  
+Note that this is an experimental feature and subject to
+change in future versions of GCC.  See
+@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
  
-@node Implementing MASTER construct
-@section Implementing MASTER construct
  
-@smallexample
-if (omp_get_thread_num () == 0)
-  block
-@end smallexample
  
-Alternately, we generate two copies of the parallel subfunction
-and only include this in the version run by the master thread.
-Surely this is not worthwhile though...
+@c ---------------------------------------------------------------------
+@c OpenACC Runtime Library Routines
+@c ---------------------------------------------------------------------
  
+@node OpenACC Runtime Library Routines
+@chapter OpenACC Runtime Library Routines
  
+The runtime routines described here are defined by section 3 of the OpenACC
+specifications in version 2.0.
+They have C linkage, and do not throw exceptions.
+Generally, they are available only for the host, with the exception of
+@code{acc_on_device}, which is available for both the host and the
+acceleration device.
  
-@node Implementing CRITICAL construct
-@section Implementing CRITICAL construct
+@menu
+* acc_get_num_devices::         Get number of devices for the given device
+                                type.
+* acc_set_device_type::         Set type of device accelerator to use.
+* acc_get_device_type::         Get type of device accelerator to be used.
+* acc_set_device_num::          Set device number to use.
+* acc_get_device_num::          Get device number to be used.
+* acc_async_test::              Tests for completion of a specific asynchronous
+                                operation.
+* acc_async_test_all::          Tests for completion of all asychronous
+                                operations.
+* acc_wait::                    Wait for completion of a specific asynchronous
+                                operation.
+* acc_wait_all::                Waits for completion of all asyncrhonous
+                                operations.
+* acc_wait_all_async::          Wait for completion of all asynchronous
+                                operations.
+* acc_wait_async::              Wait for completion of asynchronous operations.
+* acc_init::                    Initialize runtime for a specific device type.
+* acc_shutdown::                Shuts down the runtime for a specific device
+                                type.
+* acc_on_device::               Whether executing on a particular device
+* acc_malloc::                  Allocate device memory.
+* acc_free::                    Free device memory.
+* acc_copyin::                  Allocate device memory and copy host memory to
+                                it.
+* acc_present_or_copyin::       If the data is not present on the device,
+                                allocate device memory and copy from host
+                                memory.
+* acc_create::                  Allocate device memory and map it to host
+                                memory.
+* acc_present_or_create::       If the data is not present on the device,
+                                allocate device memory and map it to host
+                                memory.
+* acc_copyout::                 Copy device memory to host memory.
+* acc_delete::                  Free device memory.
+* acc_update_device::           Update device memory from mapped host memory.
+* acc_update_self::             Update host memory from mapped device memory.
+* acc_map_data::                Map previously allocated device memory to host
+                                memory.
+* acc_unmap_data::              Unmap device memory from host memory.
+* acc_deviceptr::               Get device pointer associated with specific
+                                host address.
+* acc_hostptr::                 Get host pointer associated with specific
+                                device address.
+* acc_is_present::              Indiciate whether host variable / array is
+                                present on device.
+* acc_memcpy_to_device::        Copy host memory to device memory.
+* acc_memcpy_from_device::      Copy device memory to host memory.
+
+API routines for target platforms.
+
+* acc_get_current_cuda_device:: Get CUDA device handle.
+* acc_get_current_cuda_context::Get CUDA context handle.
+* acc_get_cuda_stream::         Get CUDA stream handle.
+* acc_set_cuda_stream::         Set CUDA stream handle.
+
+API routines for the OpenACC Profiling Interface.
+
+* acc_prof_register::           Register callbacks.
+* acc_prof_unregister::         Unregister callbacks.
+* acc_prof_lookup::             Obtain inquiry functions.
+* acc_register_library::        Library registration.
+@end menu
  
-Without a specified name,
  
-@smallexample
-  void GOMP_critical_start (void);
-  void GOMP_critical_end (void);
-@end smallexample
  
-so that we don't get COPY relocations from libgomp to the main
-application.
+@node acc_get_num_devices
+@section @code{acc_get_num_devices} -- Get number of devices for given device type
+@table @asis
+@item @emph{Description}
+This function returns a value indicating the number of devices available
+for the device type specified in @var{devicetype}. 
  
-With a specified name, use omp_set_lock and omp_unset_lock with
-name being transformed into a variable declared like
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
+@end multitable
  
-@smallexample
-  omp_lock_t gomp_critical_user_<name> __attribute__((common))
-@end smallexample
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
+@item                  @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
  
-Ideally the ABI would specify that all zero is a valid unlocked
-state, and so we wouldn't need to initialize this at
-startup.
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.1.
+@end table
  
  
  
-@node Implementing ATOMIC construct
-@section Implementing ATOMIC construct
+@node acc_set_device_type
+@section @code{acc_set_device_type} -- Set type of device accelerator to use.
+@table @asis
+@item @emph{Description}
+This function indicates to the runtime library which device typr, specified
+in @var{devicetype}, to use when executing a parallel or kernels region. 
  
-The target should implement the @code{__sync} builtins.
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
+@end multitable
  
-Failing that we could add
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
+@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
  
-@smallexample
-  void GOMP_atomic_enter (void)
-  void GOMP_atomic_exit (void)
-@end smallexample
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.2.
+@end table
  
-which reuses the regular lock code, but with yet another lock
-object private to the library.
  
  
+@node acc_get_device_type
+@section @code{acc_get_device_type} -- Get type of device accelerator to be used.
+@table @asis
+@item @emph{Description}
+This function returns what device type will be used when executing a
+parallel or kernels region.
  
-@node Implementing FLUSH construct
-@section Implementing FLUSH construct
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
+@end multitable
  
-Expands to the @code{__sync_synchronize} builtin.
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
+@item                  @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
+@end multitable
  
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.3.
+@end table
  
  
-@node Implementing BARRIER construct
-@section Implementing BARRIER construct
  
-@smallexample
-  void GOMP_barrier (void)
-@end smallexample
+@node acc_set_device_num
+@section @code{acc_set_device_num} -- Set device number to use.
+@table @asis
+@item @emph{Description}
+This function will indicate to the runtime which device number,
+specified by @var{num}, associated with the specifed device
+type @var{devicetype}.
  
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_set_device_num(int num, acc_device_t devicetype);}
+@end multitable
  
-@node Implementing THREADPRIVATE construct
-@section Implementing THREADPRIVATE construct
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
+@item                   @tab @code{integer devicenum}
+@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
+@end multitable
  
-In _most_ cases we can map this directly to @code{__thread}.  Except
-that OMP allows constructors for C++ objects.  We can either
-refuse to support this (how often is it used?) or we can 
-implement something akin to .ctors.
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.4.
+@end table
  
-Even more ideally, this ctor feature is handled by extensions
-to the main pthreads library.  Failing that, we can have a set
-of entry points to register ctor functions to be called.
  
  
+@node acc_get_device_num
+@section @code{acc_get_device_num} -- Get device number to be used.
+@table @asis
+@item @emph{Description}
+This function returns which device number associated with the specified device
+type @var{devicetype}, will be used when executing a parallel or kernels
+region.
  
-@node Implementing PRIVATE clause
-@section Implementing PRIVATE clause
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
+@end multitable
  
-In association with a PARALLEL, or within the lexical extent
-of a PARALLEL block, the variable becomes a local variable in
-the parallel subfunction.
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
+@item                   @tab @code{integer(kind=acc_device_kind) devicetype}
+@item                   @tab @code{integer acc_get_device_num}
+@end multitable
  
-In association with FOR or SECTIONS blocks, create a new
-automatic variable within the current function.  This preserves
-the semantic of new variable creation.
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.5.
+@end table
  
  
  
-@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
+@node acc_async_test
+@section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
+@table @asis
+@item @emph{Description}
+This function tests for completion of the asynchrounous operation specified
+in @var{arg}. In C/C++, a non-zero value will be returned to indicate
+the specified asynchronous operation has completed. While Fortran will return
+a @code{true}. If the asynchrounous operation has not completed, C/C++ returns
+a zero and Fortran returns a @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_async_test(arg)}
+@item                   @tab @code{integer(kind=acc_handle_kind) arg}
+@item                   @tab @code{logical acc_async_test}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.6.
+@end table
+
+
+
+@node acc_async_test_all
+@section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function tests for completion of all asynchrounous operations.
+In C/C++, a non-zero value will be returned to indicate all asynchronous
+operations have completed. While Fortran will return a @code{true}. If
+any asynchronous operation has not completed, C/C++ returns a zero and
+Fortran returns a @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_async_test()}
+@item                   @tab @code{logical acc_get_device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.7.
+@end table
+
+
+
+@node acc_wait
+@section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
+@table @asis
+@item @emph{Description}
+This function waits for completion of the asynchronous operation
+specified in @var{arg}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait(arg);}
+@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
+@item                   @tab @code{integer(acc_handle_kind) arg}
+@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
+@item                                               @tab @code{integer(acc_handle_kind) arg}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.8.
+@end table
+
+
+
+@node acc_wait_all
+@section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function waits for the completion of all asynchronous operations.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_all(void);}
+@item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
+@item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.10.
+@end table
+
+
+
+@node acc_wait_all_async
+@section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function enqueues a wait operation on the queue @var{async} for any
+and all asynchronous operations that have been previously enqueued on
+any queue.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
+@item                   @tab @code{integer(acc_handle_kind) async}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.11.
+@end table
+
+
+
+@node acc_wait_async
+@section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
+@table @asis
+@item @emph{Description}
+This function enqueues a wait operation on queue @var{async} for any and all
+asynchronous operations enqueued on queue @var{arg}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
+@item                   @tab @code{integer(acc_handle_kind) arg, async}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.9.
+@end table
+
+
+
+@node acc_init
+@section @code{acc_init} -- Initialize runtime for a specific device type.
+@table @asis
+@item @emph{Description}
+This function initializes the runtime for the device type specified in
+@var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
+@item                   @tab @code{integer(acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.12.
+@end table
+
+
+
+@node acc_shutdown
+@section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
+@table @asis
+@item @emph{Description}
+This function shuts down the runtime for the device type specified in
+@var{devicetype}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
+@item                   @tab @code{integer(acc_device_kind) devicetype}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.13.
+@end table
+
+
+
+@node acc_on_device
+@section @code{acc_on_device} -- Whether executing on a particular device
+@table @asis
+@item @emph{Description}:
+This function returns whether the program is executing on a particular
+device specified in @var{devicetype}. In C/C++ a non-zero value is
+returned to indicate the device is execiting on the specified device type.
+In Fortran, @code{true} will be returned. If the program is not executing
+on the specified device type C/C++ will return a zero, while Fortran will
+return @code{false}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
+@item                   @tab @code{integer(acc_device_kind) devicetype}
+@item                   @tab @code{logical acc_on_device}
+@end multitable
+
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.14.
+@end table
+
+
+
+@node acc_malloc
+@section @code{acc_malloc} -- Allocate device memory.
+@table @asis
+@item @emph{Description}
+This function allocates @var{len} bytes of device memory. It returns
+the device address of the allocated memory.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.15.
+@end table
+
+
+
+@node acc_free
+@section @code{acc_free} -- Free device memory.
+@table @asis
+@item @emph{Description}
+Free previously allocated device memory at the device address @code{a}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.16.
+@end table
+
+
+
+@node acc_copyin
+@section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
+@table @asis
+@item @emph{Description}
+In C/C++, this function allocates @var{len} bytes of device memory
+and maps it to the specified host address in @var{a}. The device
+address of the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a
+variable or array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.17.
+@end table
+
+
+
+@node acc_present_or_copyin
+@section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
+@table @asis
+@item @emph{Description}
+This function tests if the host data specifed by @var{a} and of length
+@var{len} is present or not. If it is not present, then device memory
+will be allocated and the host memory copied. The device address of
+the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
+@item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.18.
+@end table
+
+
+
+@node acc_create
+@section @code{acc_create} -- Allocate device memory and map it to host memory.
+@table @asis
+@item @emph{Description}
+This function allocates device memory and maps it to host memory specified
+by the host address @var{a} with a length of @var{len} bytes. In C/C++,
+the function returns the device address of the allocated device memory.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_create(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.19.
+@end table
+
+
+
+@node acc_present_or_create
+@section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
+@table @asis
+@item @emph{Description}
+This function tests if the host data specifed by @var{a} and of length
+@var{len} is present or not. If it is not present, then device memory
+will be allocated and mapped to host memory. In C/C++, the device address
+of the newly allocated device memory is returned.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
+@item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.20.
+@end table
+
+
+
+@node acc_copyout
+@section @code{acc_copyout} -- Copy device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function copies mapped device memory to host memory which is specified
+by host address @var{a} for a length @var{len} bytes in C/C++.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.21.
+@end table
+
+
+
+@node acc_delete
+@section @code{acc_delete} -- Free device memory.
+@table @asis
+@item @emph{Description}
+This function frees previously allocated device memory specified by
+the device address @var{a} and the length of @var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.22.
+@end table
+
+
+
+@node acc_update_device
+@section @code{acc_update_device} -- Update device memory from mapped host memory.
+@table @asis
+@item @emph{Description}
+This function updates the device copy from the previously mapped host memory.
+The host memory is specified with the host address @var{a} and a length of
+@var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.23.
+@end table
+
+
+
+@node acc_update_self
+@section @code{acc_update_self} -- Update host memory from mapped device memory.
+@table @asis
+@item @emph{Description}
+This function updates the host copy from the previously mapped device memory.
+The host memory is specified with the host address @var{a} and a length of
+@var{len} bytes.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.24.
+@end table
+
+
+
+@node acc_map_data
+@section @code{acc_map_data} -- Map previously allocated device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function maps previously allocated device and host memory. The device
+memory is specified with the device address @var{d}. The host memory is
+specified with the host address @var{h} and a length of @var{len}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.25.
+@end table
+
+
+
+@node acc_unmap_data
+@section @code{acc_unmap_data} -- Unmap device memory from host memory.
+@table @asis
+@item @emph{Description}
+This function unmaps previously mapped device and host memory. The latter
+specified by @var{h}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.26.
+@end table
+
+
+
+@node acc_deviceptr
+@section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
+@table @asis
+@item @emph{Description}
+This function returns the device address that has been mapped to the
+host address specified by @var{h}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.27.
+@end table
+
+
+
+@node acc_hostptr
+@section @code{acc_hostptr} -- Get host pointer associated with specific device address.
+@table @asis
+@item @emph{Description}
+This function returns the host address that has been mapped to the
+device address specified by @var{d}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.28.
+@end table
+
+
+
+@node acc_is_present
+@section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
+@table @asis
+@item @emph{Description}
+This function indicates whether the specified host address in @var{a} and a
+length of @var{len} bytes is present on the device. In C/C++, a non-zero
+value is returned to indicate the presence of the mapped memory on the
+device. A zero is returned to indicate the memory is not mapped on the
+device.
+
+In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
+a contiguous array section. The second form @var{a} specifies a variable or
+array element and @var{len} specifies the length in bytes. If the host
+memory is mapped to device memory, then a @code{true} is returned. Otherwise,
+a @code{false} is return to indicate the mapped memory is not present.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{function acc_is_present(a)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{logical acc_is_present}
+@item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
+@item                   @tab @code{type, dimension(:[,:]...) :: a}
+@item                   @tab @code{integer len}
+@item                   @tab @code{logical acc_is_present}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.29.
+@end table
+
+
+
+@node acc_memcpy_to_device
+@section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
+@table @asis
+@item @emph{Description}
+This function copies host memory specified by host address of @var{src} to
+device memory specified by the device address @var{dest} for a length of
+@var{bytes} bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.30.
+@end table
+
+
+
+@node acc_memcpy_from_device
+@section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
+@table @asis
+@item @emph{Description}
+This function copies host memory specified by host address of @var{src} from
+device memory specified by the device address @var{dest} for a length of
+@var{bytes} bytes.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+3.2.31.
+@end table
+
+
+
+@node acc_get_current_cuda_device
+@section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA device handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+A.2.1.1.
+@end table
+
+
+
+@node acc_get_current_cuda_context
+@section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA context handle. This handle is the same
+as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+A.2.1.2.
+@end table
+
+
+
+@node acc_get_cuda_stream
+@section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
+@table @asis
+@item @emph{Description}
+This function returns the CUDA stream handle for the queue @var{async}.
+This handle is the same as used by the CUDA Runtime or Driver API's.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+A.2.1.3.
+@end table
+
+
+
+@node acc_set_cuda_stream
+@section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
+@table @asis
+@item @emph{Description}
+This function associates the stream handle specified by @var{stream} with
+the queue @var{async}.
+
+This cannot be used to change the stream handle associated with
+@code{acc_async_sync}.
+
+The return value is not specified.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+A.2.1.4.
+@end table
+
+
+
+@node acc_prof_register
+@section @code{acc_prof_register} -- Register callbacks.
+@table @asis
+@item @emph{Description}:
+This function registers callbacks.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
+@end multitable
+
+@item @emph{See also}:
+@ref{OpenACC Profiling Interface}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+5.3.
+@end table
+
+
+
+@node acc_prof_unregister
+@section @code{acc_prof_unregister} -- Unregister callbacks.
+@table @asis
+@item @emph{Description}:
+This function unregisters callbacks.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
+@end multitable
+
+@item @emph{See also}:
+@ref{OpenACC Profiling Interface}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+5.3.
+@end table
+
+
+
+@node acc_prof_lookup
+@section @code{acc_prof_lookup} -- Obtain inquiry functions.
+@table @asis
+@item @emph{Description}:
+Function to obtain inquiry functions.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
+@end multitable
+
+@item @emph{See also}:
+@ref{OpenACC Profiling Interface}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+5.3.
+@end table
+
+
+
+@node acc_register_library
+@section @code{acc_register_library} -- Library registration.
+@table @asis
+@item @emph{Description}:
+Function for library registration.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
+@end multitable
+
+@item @emph{See also}:
+@ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+5.3.
+@end table
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Environment Variables
+@c ---------------------------------------------------------------------
+
+@node OpenACC Environment Variables
+@chapter OpenACC Environment Variables
+
+The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
+are defined by section 4 of the OpenACC specification in version 2.0.
+The variable @env{ACC_PROFLIB}
+is defined by section 4 of the OpenACC specification in version 2.6.
+The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
+
+@menu
+* ACC_DEVICE_TYPE::
+* ACC_DEVICE_NUM::
+* ACC_PROFLIB::
+* GCC_ACC_NOTIFY::
+@end menu
+
+
+
+@node ACC_DEVICE_TYPE
+@section @code{ACC_DEVICE_TYPE}
+@table @asis
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+4.1.
+@end table
+
+
+
+@node ACC_DEVICE_NUM
+@section @code{ACC_DEVICE_NUM}
+@table @asis
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.0}, section
+4.2.
+@end table
+
+
+
+@node ACC_PROFLIB
+@section @code{ACC_PROFLIB}
+@table @asis
+@item @emph{See also}:
+@ref{acc_register_library}, @ref{OpenACC Profiling Interface}
+
+@item @emph{Reference}:
+@uref{https://www.openacc.org, OpenACC specification v2.6}, section
+4.3.
+@end table
+
+
+
+@node GCC_ACC_NOTIFY
+@section @code{GCC_ACC_NOTIFY}
+@table @asis
+@item @emph{Description}:
+Print debug information pertaining to the accelerator.
+@end table
+
+
+
+@c ---------------------------------------------------------------------
+@c CUDA Streams Usage
+@c ---------------------------------------------------------------------
+
+@node CUDA Streams Usage
+@chapter CUDA Streams Usage
+
+This applies to the @code{nvptx} plugin only.
+
+The library provides elements that perform asynchronous movement of
+data and asynchronous operation of computing constructs.  This
+asynchronous functionality is implemented by making use of CUDA
+streams@footnote{See "Stream Management" in "CUDA Driver API",
+TRM-06703-001, Version 5.5, for additional information}.
+
+The primary means by that the asychronous functionality is accessed
+is through the use of those OpenACC directives which make use of the
+@code{async} and @code{wait} clauses.  When the @code{async} clause is
+first used with a directive, it creates a CUDA stream.  If an
+@code{async-argument} is used with the @code{async} clause, then the
+stream is associated with the specified @code{async-argument}.
+
+Following the creation of an association between a CUDA stream and the
+@code{async-argument} of an @code{async} clause, both the @code{wait}
+clause and the @code{wait} directive can be used.  When either the
+clause or directive is used after stream creation, it creates a
+rendezvous point whereby execution waits until all operations
+associated with the @code{async-argument}, that is, stream, have
+completed.
+
+Normally, the management of the streams that are created as a result of
+using the @code{async} clause, is done without any intervention by the
+caller.  This implies the association between the @code{async-argument}
+and the CUDA stream will be maintained for the lifetime of the program.
+However, this association can be changed through the use of the library
+function @code{acc_set_cuda_stream}.  When the function
+@code{acc_set_cuda_stream} is called, the CUDA stream that was
+originally associated with the @code{async} clause will be destroyed.
+Caution should be taken when changing the association as subsequent
+references to the @code{async-argument} refer to a different
+CUDA stream.
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Library Interoperability
+@c ---------------------------------------------------------------------
+
+@node OpenACC Library Interoperability
+@chapter OpenACC Library Interoperability
+
+@section Introduction
+
+The OpenACC library uses the CUDA Driver API, and may interact with
+programs that use the Runtime library directly, or another library
+based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
+"Interactions with the CUDA Driver API" in
+"CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
+Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
+for additional information on library interoperability.}.
+This chapter describes the use cases and what changes are
+required in order to use both the OpenACC library and the CUBLAS and Runtime
+libraries within a program.
+
+@section First invocation: NVIDIA CUBLAS library API
+
+In this first use case (see below), a function in the CUBLAS library is called
+prior to any of the functions in the OpenACC library. More specifically, the
+function @code{cublasCreate()}.
+
+When invoked, the function initializes the library and allocates the
+hardware resources on the host and the device on behalf of the caller. Once
+the initialization and allocation has completed, a handle is returned to the
+caller. The OpenACC library also requires initialization and allocation of
+hardware resources. Since the CUBLAS library has already allocated the
+hardware resources for the device, all that is left to do is to initialize
+the OpenACC library and acquire the hardware resources on the host.
+
+Prior to calling the OpenACC function that initializes the library and
+allocate the host hardware resources, you need to acquire the device number
+that was allocated during the call to @code{cublasCreate()}. The invoking of the
+runtime library function @code{cudaGetDevice()} accomplishes this. Once
+acquired, the device number is passed along with the device type as
+parameters to the OpenACC library function @code{acc_set_device_num()}.
+
+Once the call to @code{acc_set_device_num()} has completed, the OpenACC
+library uses the  context that was created during the call to
+@code{cublasCreate()}. In other words, both libraries will be sharing the
+same context.
+
+@smallexample
+    /* Create the handle */
+    s = cublasCreate(&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    @{
+        fprintf(stderr, "cublasCreate failed %d\n", s);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Get the device number */
+    e = cudaGetDevice(&dev);
+    if (e != cudaSuccess)
+    @{
+        fprintf(stderr, "cudaGetDevice failed %d\n", e);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Initialize OpenACC library and use device 'dev' */
+    acc_set_device_num(dev, acc_device_nvidia);
+
+@end smallexample
+@center Use Case 1 
+
+@section First invocation: OpenACC library API
+
+In this second use case (see below), a function in the OpenACC library is
+called prior to any of the functions in the CUBLAS library. More specificially,
+the function @code{acc_set_device_num()}.
+
+In the use case presented here, the function @code{acc_set_device_num()}
+is used to both initialize the OpenACC library and allocate the hardware
+resources on the host and the device. In the call to the function, the
+call parameters specify which device to use and what device
+type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
+is but one method to initialize the OpenACC library and allocate the
+appropriate hardware resources. Other methods are available through the
+use of environment variables and these will be discussed in the next section.
+
+Once the call to @code{acc_set_device_num()} has completed, other OpenACC
+functions can be called as seen with multiple calls being made to
+@code{acc_copyin()}. In addition, calls can be made to functions in the
+CUBLAS library. In the use case a call to @code{cublasCreate()} is made
+subsequent to the calls to @code{acc_copyin()}.
+As seen in the previous use case, a call to @code{cublasCreate()}
+initializes the CUBLAS library and allocates the hardware resources on the
+host and the device.  However, since the device has already been allocated,
+@code{cublasCreate()} will only initialize the CUBLAS library and allocate
+the appropriate hardware resources on the host. The context that was created
+as part of the OpenACC initialization is shared with the CUBLAS library,
+similarly to the first use case.
+
+@smallexample
+    dev = 0;
+
+    acc_set_device_num(dev, acc_device_nvidia);
+
+    /* Copy the first set to the device */
+    d_X = acc_copyin(&h_X[0], N * sizeof (float));
+    if (d_X == NULL)
+    @{ 
+        fprintf(stderr, "copyin error h_X\n");
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Copy the second set to the device */
+    d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
+    if (d_Y == NULL)
+    @{ 
+        fprintf(stderr, "copyin error h_Y1\n");
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Create the handle */
+    s = cublasCreate(&h);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    @{
+        fprintf(stderr, "cublasCreate failed %d\n", s);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Perform saxpy using CUBLAS library function */
+    s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
+    if (s != CUBLAS_STATUS_SUCCESS)
+    @{
+        fprintf(stderr, "cublasSaxpy failed %d\n", s);
+        exit(EXIT_FAILURE);
+    @}
+
+    /* Copy the results from the device */
+    acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
+
+@end smallexample
+@center Use Case 2
+
+@section OpenACC library and environment variables
+
+There are two environment variables associated with the OpenACC library
+that may be used to control the device type and device number:
+@env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respecively. These two
+environement variables can be used as an alternative to calling
+@code{acc_set_device_num()}. As seen in the second use case, the device
+type and device number were specified using @code{acc_set_device_num()}.
+If however, the aforementioned environment variables were set, then the
+call to @code{acc_set_device_num()} would not be required.
+
+
+The use of the environment variables is only relevant when an OpenACC function
+is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
+is called prior to a call to an OpenACC function, then you must call
+@code{acc_set_device_num()}@footnote{More complete information
+about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
+sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
+Application Programming Interface”, Version 2.0.}
+
+
+
+@c ---------------------------------------------------------------------
+@c OpenACC Profiling Interface
+@c ---------------------------------------------------------------------
+
+@node OpenACC Profiling Interface
+@chapter OpenACC Profiling Interface
+
+@section Implementation Status and Implementation-Defined Behavior
+
+We're implementing the OpenACC Profiling Interface as defined by the
+OpenACC 2.6 specification.  We're clarifying some aspects here as
+@emph{implementation-defined behavior}, while they're still under
+discussion within the OpenACC Technical Committee.
+
+This implementation is tuned to keep the performance impact as low as
+possible for the (very common) case that the Profiling Interface is
+not enabled.  This is relevant, as the Profiling Interface affects all
+the @emph{hot} code paths (in the target code, not in the offloaded
+code).  Users of the OpenACC Profiling Interface can be expected to
+understand that performance will be impacted to some degree once the
+Profiling Interface has gotten enabled: for example, because of the
+@emph{runtime} (libgomp) calling into a third-party @emph{library} for
+every event that has been registered.
+
+We're not yet accounting for the fact that @cite{OpenACC events may
+occur during event processing}.
+
+We're not yet implementing initialization via a
+@code{acc_register_library} function that is either statically linked
+in, or dynamically via @env{LD_PRELOAD}.
+Initialization via @code{acc_register_library} functions dynamically
+loaded via the @env{ACC_PROFLIB} environment variable does work, as
+does directly calling @code{acc_prof_register},
+@code{acc_prof_unregister}, @code{acc_prof_lookup}.
+
+As currently there are no inquiry functions defined, calls to
+@code{acc_prof_lookup} will always return @code{NULL}.
+
+There aren't separate @emph{start}, @emph{stop} events defined for the
+event types @code{acc_ev_create}, @code{acc_ev_delete},
+@code{acc_ev_alloc}, @code{acc_ev_free}.  It's not clear if these
+should be triggered before or after the actual device-specific call is
+made.  We trigger them after.
+
+Remarks about data provided to callbacks:
+
+@table @asis
+
+@item @code{acc_prof_info.event_type}
+It's not clear if for @emph{nested} event callbacks (for example,
+@code{acc_ev_enqueue_launch_start} as part of a parent compute
+construct), this should be set for the nested event
+(@code{acc_ev_enqueue_launch_start}), or if the value of the parent
+construct should remain (@code{acc_ev_compute_construct_start}).  In
+this implementation, the value will generally correspond to the
+innermost nested event type.
+
+@item @code{acc_prof_info.device_type}
+@itemize
+
+@item
+For @code{acc_ev_compute_construct_start}, and in presence of an
+@code{if} clause with @emph{false} argument, this will still refer to
+the offloading device type.
+It's not clear if that's the expected behavior.
+
+@item
+Complementary to the item before, for
+@code{acc_ev_compute_construct_end}, this is set to
+@code{acc_device_host} in presence of an @code{if} clause with
+@emph{false} argument.
+It's not clear if that's the expected behavior.
+
+@end itemize
+
+@item @code{acc_prof_info.thread_id}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_prof_info.async}
+@itemize
+
+@item
+Not yet implemented correctly for
+@code{acc_ev_compute_construct_start}.
+
+@item
+In a compute construct, for host-fallback
+execution/@code{acc_device_host} it will always be
+@code{acc_async_sync}.
+It's not clear if that's the expected behavior.
+
+@item
+For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
+it will always be @code{acc_async_sync}.
+It's not clear if that's the expected behavior.
+
+@end itemize
+
+@item @code{acc_prof_info.async_queue}
+There is no @cite{limited number of asynchronous queues} in libgomp.
+This will always have the same value as @code{acc_prof_info.async}.
+
+@item @code{acc_prof_info.src_file}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_prof_info.func_name}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_prof_info.line_no}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_prof_info.end_line_no}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_prof_info.func_line_no}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_prof_info.func_end_line_no}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
+Relating to @code{acc_prof_info.event_type} discussed above, in this
+implementation, this will always be the same value as
+@code{acc_prof_info.event_type}.
+
+@item @code{acc_event_info.*.parent_construct}
+@itemize
+
+@item
+Will be @code{acc_construct_parallel} for all OpenACC compute
+constructs as well as many OpenACC Runtime API calls; should be the
+one matching the actual construct, or
+@code{acc_construct_runtime_api}, respectively.
+
+@item
+Will be @code{acc_construct_enter_data} or
+@code{acc_construct_exit_data} when processing variable mappings
+specified in OpenACC @emph{declare} directives; should be
+@code{acc_construct_declare}.
+
+@item
+For implicit @code{acc_ev_device_init_start},
+@code{acc_ev_device_init_end}, and explicit as well as implicit
+@code{acc_ev_alloc}, @code{acc_ev_free},
+@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
+@code{acc_ev_enqueue_download_start}, and
+@code{acc_ev_enqueue_download_end}, will be
+@code{acc_construct_parallel}; should reflect the real parent
+construct.
+
+@end itemize
+
+@item @code{acc_event_info.*.implicit}
+For @code{acc_ev_alloc}, @code{acc_ev_free},
+@code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
+@code{acc_ev_enqueue_download_start}, and
+@code{acc_ev_enqueue_download_end}, this currently will be @code{1}
+also for explicit usage.
+
+@item @code{acc_event_info.data_event.var_name}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_event_info.data_event.host_ptr}
+For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
+@code{NULL}.
+
+@item @code{typedef union acc_api_info}
+@dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
+Information}.  This should obviously be @code{typedef @emph{struct}
+acc_api_info}.
+
+@item @code{acc_api_info.device_api}
+Possibly not yet implemented correctly for
+@code{acc_ev_compute_construct_start},
+@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
+will always be @code{acc_device_api_none} for these event types.
+For @code{acc_ev_enter_data_start}, it will be
+@code{acc_device_api_none} in some cases.
+
+@item @code{acc_api_info.device_type}
+Always the same as @code{acc_prof_info.device_type}.
+
+@item @code{acc_api_info.vendor}
+Always @code{-1}; not yet implemented.
+
+@item @code{acc_api_info.device_handle}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_api_info.context_handle}
+Always @code{NULL}; not yet implemented.
+
+@item @code{acc_api_info.async_handle}
+Always @code{NULL}; not yet implemented.
+
+@end table
+
+Remarks about certain event types:
+
+@table @asis
+
+@item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
+@itemize
+
+@item
+@c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
+@c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
+@c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
+Whan a compute construct triggers implicit
+@code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
+events, they currently aren't @emph{nested within} the corresponding
+@code{acc_ev_compute_construct_start} and
+@code{acc_ev_compute_construct_end}, but they're currently observed
+@emph{before} @code{acc_ev_compute_construct_start}.
+It's not clear what to do: the standard asks us provide a lot of
+details to the @code{acc_ev_compute_construct_start} callback, without
+(implicitly) initializing a device before?
+
+@item
+Callbacks for these event types will not be invoked for calls to the
+@code{acc_set_device_type} and @code{acc_set_device_num} functions.
+It's not clear if they should be.
+
+@end itemize
+
+@item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
+@itemize
+
+@item
+Callbacks for these event types will also be invoked for OpenACC
+@emph{host_data} constructs.
+It's not clear if they should be.
+
+@item
+Callbacks for these event types will also be invoked when processing
+variable mappings specified in OpenACC @emph{declare} directives.
+It's not clear if they should be.
+
+@end itemize
+
+@end table
+
+Callbacks for the following event types will be invoked, but dispatch
+and information provided therein has not yet been thoroughly reviewed:
+
+@itemize
+@item @code{acc_ev_alloc}
+@item @code{acc_ev_free}
+@item @code{acc_ev_update_start}, @code{acc_ev_update_end}
+@item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
+@item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
+@end itemize
+
+During device initialization, and finalization, respectively,
+callbacks for the following event types will not yet be invoked:
+
+@itemize
+@item @code{acc_ev_alloc}
+@item @code{acc_ev_free}
+@end itemize
+
+Callbacks for the following event types have not yet been implemented,
+so currently won't be invoked:
+
+@itemize
+@item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
+@item @code{acc_ev_runtime_shutdown}
+@item @code{acc_ev_create}, @code{acc_ev_delete}
+@item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
+@end itemize
+
+For the following runtime library functions, not all expected
+callbacks will be invoked (mostly concerning implicit device
+initialization):
+
+@itemize
+@item @code{acc_get_num_devices}
+@item @code{acc_set_device_type}
+@item @code{acc_get_device_type}
+@item @code{acc_set_device_num}
+@item @code{acc_get_device_num}
+@item @code{acc_init}
+@item @code{acc_shutdown}
+@end itemize
+
+Aside from implicit device initialization, for the following runtime
+library functions, no callbacks will be invoked for shared-memory
+offloading devices (it's not clear if they should be):
+
+@itemize
+@item @code{acc_malloc}
+@item @code{acc_free}
+@item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
+@item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
+@item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
+@item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
+@item @code{acc_update_device}, @code{acc_update_device_async}
+@item @code{acc_update_self}, @code{acc_update_self_async}
+@item @code{acc_map_data}, @code{acc_unmap_data}
+@item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
+@item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
+@end itemize
+
+
+
+@c ---------------------------------------------------------------------
+@c The libgomp ABI
+@c ---------------------------------------------------------------------
+
+@node The libgomp ABI
+@chapter The libgomp ABI
+
+The following sections present notes on the external ABI as 
+presented by libgomp.  Only maintainers should need them.
+
+@menu
+* Implementing MASTER construct::
+* Implementing CRITICAL construct::
+* Implementing ATOMIC construct::
+* Implementing FLUSH construct::
+* Implementing BARRIER construct::
+* Implementing THREADPRIVATE construct::
+* Implementing PRIVATE clause::
+* Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
+* Implementing REDUCTION clause::
+* Implementing PARALLEL construct::
+* Implementing FOR construct::
+* Implementing ORDERED construct::
+* Implementing SECTIONS construct::
+* Implementing SINGLE construct::
+* Implementing OpenACC's PARALLEL construct::
+@end menu
+
+
+@node Implementing MASTER construct
+@section Implementing MASTER construct
+
+@smallexample
+if (omp_get_thread_num () == 0)
+  block
+@end smallexample
+
+Alternately, we generate two copies of the parallel subfunction
+and only include this in the version run by the master thread.
+Surely this is not worthwhile though...
+
+
+
+@node Implementing CRITICAL construct
+@section Implementing CRITICAL construct
+
+Without a specified name,
+
+@smallexample
+  void GOMP_critical_start (void);
+  void GOMP_critical_end (void);
+@end smallexample
+
+so that we don't get COPY relocations from libgomp to the main
+application.
+
+With a specified name, use omp_set_lock and omp_unset_lock with
+name being transformed into a variable declared like
+
+@smallexample
+  omp_lock_t gomp_critical_user_<name> __attribute__((common))
+@end smallexample
+
+Ideally the ABI would specify that all zero is a valid unlocked
+state, and so we wouldn't need to initialize this at
+startup.
+
+
+
+@node Implementing ATOMIC construct
+@section Implementing ATOMIC construct
+
+The target should implement the @code{__sync} builtins.
+
+Failing that we could add
+
+@smallexample
+  void GOMP_atomic_enter (void)
+  void GOMP_atomic_exit (void)
+@end smallexample
+
+which reuses the regular lock code, but with yet another lock
+object private to the library.
+
+
+
+@node Implementing FLUSH construct
+@section Implementing FLUSH construct
+
+Expands to the @code{__sync_synchronize} builtin.
+
+
+
+@node Implementing BARRIER construct
+@section Implementing BARRIER construct
+
+@smallexample
+  void GOMP_barrier (void)
+@end smallexample
+
+
+@node Implementing THREADPRIVATE construct
+@section Implementing THREADPRIVATE construct
+
+In _most_ cases we can map this directly to @code{__thread}.  Except
+that OMP allows constructors for C++ objects.  We can either
+refuse to support this (how often is it used?) or we can 
+implement something akin to .ctors.
+
+Even more ideally, this ctor feature is handled by extensions
+to the main pthreads library.  Failing that, we can have a set
+of entry points to register ctor functions to be called.
+
+
+
+@node Implementing PRIVATE clause
+@section Implementing PRIVATE clause
+
+In association with a PARALLEL, or within the lexical extent
+of a PARALLEL block, the variable becomes a local variable in
+the parallel subfunction.
+
+In association with FOR or SECTIONS blocks, create a new
+automatic variable within the current function.  This preserves
+the semantic of new variable creation.
+
+
+
+@node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
  @section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
  
  This seems simple enough for PARALLEL blocks.  Create a private 
@@ -2069,16 +3879,26 @@ becomes
  
  
  
+@node Implementing OpenACC's PARALLEL construct
+@section Implementing OpenACC's PARALLEL construct
+
+@smallexample
+  void GOACC_parallel ()
+@end smallexample
+
+
+
  @c ---------------------------------------------------------------------
-@c 
+@c Reporting Bugs
  @c ---------------------------------------------------------------------
  
  @node Reporting Bugs
  @chapter Reporting Bugs
  
-Bugs in the GNU OpenMP implementation should be reported via 
-@uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  For all cases, please add 
-"openmp" to the keywords field in the bug report.
+Bugs in the GNU Offloading and Multi Processing Runtime Library should
+be reported via @uref{http://gcc.gnu.org/bugzilla/, Bugzilla}.  Please add
+"openacc", or "openmp", or both to the keywords field in the bug
+report, as appropriate.