libgomp/doc/nvptx.rst

   1 ..
   2   Copyright 1988-2022 Free Software Foundation, Inc.
   3   This is part of the GCC manual.
   4   For copying conditions, see the copyright.rst file.
   5
   6 .. _nvptx:
   7
   8 nvptx
   9 *****
  10
  11 On the hardware side, there is the hierarchy (fine to coarse):
  12
  13 * thread
  14
  15 * warp
  16
  17 * thread block
  18
  19 * streaming multiprocessor
  20
  21 All OpenMP and OpenACC levels are used, i.e.
  22
  23 * OpenMP's simd and OpenACC's vector map to threads
  24
  25 * OpenMP's threads ('parallel') and OpenACC's workers map to warps
  26
  27 * OpenMP's teams and OpenACC's gang use a threadpool with the
  28         size of the number of teams or gangs, respectively.
  29
  30 The used sizes are
  31
  32 * The ``warp_size`` is always 32
  33
  34 * CUDA kernel launched: ``dim={#teams,1,1}, blocks={#threads,warp_size,1}``.
  35
  36 Additional information can be obtained by setting the environment variable to
  37 ``GOMP_DEBUG=1`` (very verbose; grep for ``kernel.*launch`` for launch
  38 parameters).
  39
  40 GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA,
  41 which caches the JIT in the user's directory (see CUDA documentation; can be
  42 tuned by the environment variables ``CUDA_CACHE_{DISABLE,MAXSIZE,PATH}``.
  43
  44 Note: While PTX ISA is generic, the ``-mptx=`` and ``-march=`` commandline
  45 options still affect the used PTX ISA code and, thus, the requirments on
  46 CUDA version and hardware.
  47
  48 The implementation remark:
  49
  50 * I/O within OpenMP target regions and OpenACC parallel/kernels is supported
  51         using the C library ``printf`` functions. Note that the Fortran
  52         ``print`` / ``write`` statements are not supported, yet.
  53
  54 * Compilation OpenMP code that contains ``requires reverse_offload``
  55         requires at least ``-march=sm_35``, compiling for ``-march=sm_30``
  56         is not supported.
  57
  58 .. -
  59    The libgomp ABI
  60    -