]> git.ipfire.org Git - thirdparty/gcc.git/blob - libgomp/doc/amd-radeon-gcn.rst
sphinx: add missing trailing newline
[thirdparty/gcc.git] / libgomp / doc / amd-radeon-gcn.rst
1 ..
2 Copyright 1988-2022 Free Software Foundation, Inc.
3 This is part of the GCC manual.
4 For copying conditions, see the copyright.rst file.
5
6 .. _amd-radeon:
7
8 AMD Radeon (GCN)
9 ****************
10
11 On the hardware side, there is the hierarchy (fine to coarse):
12
13 * work item (thread)
14
15 * wavefront
16
17 * work group
18
19 * compute unite (CU)
20
21 All OpenMP and OpenACC levels are used, i.e.
22
23 * OpenMP's simd and OpenACC's vector map to work items (thread)
24
25 * OpenMP's threads ('parallel') and OpenACC's workers map
26 to wavefronts
27
28 * OpenMP's teams and OpenACC's gang use a threadpool with the
29 size of the number of teams or gangs, respectively.
30
31 The used sizes are
32
33 * Number of teams is the specified ``num_teams`` (OpenMP) or
34 ``num_gangs`` (OpenACC) or otherwise the number of CU
35
36 * Number of wavefronts is 4 for gfx900 and 16 otherwise;
37 ``num_threads`` (OpenMP) and ``num_workers`` (OpenACC)
38 overrides this if smaller.
39
40 * The wavefront has 102 scalars and 64 vectors
41
42 * Number of workitems is always 64
43
44 * The hardware permits maximally 40 workgroups/CU and
45 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU.
46
47 * 80 scalars registers and 24 vector registers in non-kernel functions
48 (the chosen procedure-calling API).
49
50 * For the kernel itself: as many as register pressure demands (number of
51 teams and number of threads, scaled down if registers are exhausted)
52
53 The implementation remark:
54
55 * I/O within OpenMP target regions and OpenACC parallel/kernels is supported
56 using the C library ``printf`` functions and the Fortran
57 ``print`` / ``write`` statements.