]> git.ipfire.org Git - thirdparty/gcc.git/blob - libgomp/doc/nvptx.rst
sphinx: add missing trailing newline
[thirdparty/gcc.git] / libgomp / doc / nvptx.rst
1 ..
2 Copyright 1988-2022 Free Software Foundation, Inc.
3 This is part of the GCC manual.
4 For copying conditions, see the copyright.rst file.
5
6 .. _nvptx:
7
8 nvptx
9 *****
10
11 On the hardware side, there is the hierarchy (fine to coarse):
12
13 * thread
14
15 * warp
16
17 * thread block
18
19 * streaming multiprocessor
20
21 All OpenMP and OpenACC levels are used, i.e.
22
23 * OpenMP's simd and OpenACC's vector map to threads
24
25 * OpenMP's threads ('parallel') and OpenACC's workers map to warps
26
27 * OpenMP's teams and OpenACC's gang use a threadpool with the
28 size of the number of teams or gangs, respectively.
29
30 The used sizes are
31
32 * The ``warp_size`` is always 32
33
34 * CUDA kernel launched: ``dim={#teams,1,1}, blocks={#threads,warp_size,1}``.
35
36 Additional information can be obtained by setting the environment variable to
37 ``GOMP_DEBUG=1`` (very verbose; grep for ``kernel.*launch`` for launch
38 parameters).
39
40 GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA,
41 which caches the JIT in the user's directory (see CUDA documentation; can be
42 tuned by the environment variables ``CUDA_CACHE_{DISABLE,MAXSIZE,PATH}``.
43
44 Note: While PTX ISA is generic, the ``-mptx=`` and ``-march=`` commandline
45 options still affect the used PTX ISA code and, thus, the requirments on
46 CUDA version and hardware.
47
48 The implementation remark:
49
50 * I/O within OpenMP target regions and OpenACC parallel/kernels is supported
51 using the C library ``printf`` functions. Note that the Fortran
52 ``print`` / ``write`` statements are not supported, yet.
53
54 * Compilation OpenMP code that contains ``requires reverse_offload``
55 requires at least ``-march=sm_35``, compiling for ``-march=sm_30``
56 is not supported.
57
58 .. -
59 The libgomp ABI
60 -