2 Copyright 1988-2022 Free Software Foundation, Inc.
3 This is part of the GCC manual.
4 For copying conditions, see the copyright.rst file.
11 On the hardware side, there is the hierarchy (fine to coarse):
21 All OpenMP and OpenACC levels are used, i.e.
23 * OpenMP's simd and OpenACC's vector map to work items (thread)
25 * OpenMP's threads ('parallel') and OpenACC's workers map
28 * OpenMP's teams and OpenACC's gang use a threadpool with the
29 size of the number of teams or gangs, respectively.
33 * Number of teams is the specified ``num_teams`` (OpenMP) or
34 ``num_gangs`` (OpenACC) or otherwise the number of CU
36 * Number of wavefronts is 4 for gfx900 and 16 otherwise;
37 ``num_threads`` (OpenMP) and ``num_workers`` (OpenACC)
38 overrides this if smaller.
40 * The wavefront has 102 scalars and 64 vectors
42 * Number of workitems is always 64
44 * The hardware permits maximally 40 workgroups/CU and
45 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU.
47 * 80 scalars registers and 24 vector registers in non-kernel functions
48 (the chosen procedure-calling API).
50 * For the kernel itself: as many as register pressure demands (number of
51 teams and number of threads, scaled down if registers are exhausted)
53 The implementation remark:
55 * I/O within OpenMP target regions and OpenACC parallel/kernels is supported
56 using the C library ``printf`` functions and the Fortran
57 ``print`` / ``write`` statements.