]>
Commit | Line | Data |
---|---|---|
c63539ff ML |
1 | .. |
2 | Copyright 1988-2022 Free Software Foundation, Inc. | |
3 | This is part of the GCC manual. | |
4 | For copying conditions, see the copyright.rst file. | |
5 | ||
6 | .. program:: Nvidia PTX | |
7 | ||
8 | .. index:: Nvidia PTX options, nvptx options | |
9 | ||
10 | .. _nvidia-ptx-options: | |
11 | ||
12 | Nvidia PTX Options | |
13 | ^^^^^^^^^^^^^^^^^^ | |
14 | ||
15 | These options are defined for Nvidia PTX: | |
16 | ||
17 | .. option:: -m64 | |
18 | ||
19 | Ignored, but preserved for backward compatibility. Only 64-bit ABI is | |
20 | supported. | |
21 | ||
22 | .. option:: -march={architecture-string} | |
23 | ||
24 | Generate code for the specified PTX ISA target architecture | |
25 | (e.g. :samp:`sm_35`). Valid architecture strings are :samp:`sm_30`, | |
26 | :samp:`sm_35`, :samp:`sm_53`, :samp:`sm_70`, :samp:`sm_75` and | |
27 | :samp:`sm_80`. | |
28 | The default depends on how the compiler has been configured, see | |
29 | :option:`--with-arch`. | |
30 | ||
31 | This option sets the value of the preprocessor macro | |
32 | ``__PTX_SM__`` ; for instance, for :samp:`sm_35`, it has the value | |
33 | :samp:`350`. | |
34 | ||
35 | .. option:: -misa={architecture-string} | |
36 | ||
37 | Alias of :option:`-march=`. | |
38 | ||
39 | .. option:: -march-map={architecture-string} | |
40 | ||
41 | Select the closest available :option:`-march=` value that is not more | |
42 | capable. For instance, for :option:`-march-map=sm_50` select | |
43 | :option:`-march=sm_35`, and for :option:`-march-map=sm_53` select | |
44 | :option:`-march=sm_53`. | |
45 | ||
46 | .. option:: -mptx={version-string} | |
47 | ||
48 | Generate code for the specified PTX ISA version (e.g. :samp:`7.0`). | |
49 | Valid version strings include :samp:`3.1`, :samp:`6.0`, :samp:`6.3`, and | |
50 | :samp:`7.0`. The default PTX ISA version is 6.0, unless a higher | |
51 | version is required for specified PTX ISA target architecture via | |
52 | option :option:`-march=`. | |
53 | ||
54 | This option sets the values of the preprocessor macros | |
55 | ``__PTX_ISA_VERSION_MAJOR__`` and ``__PTX_ISA_VERSION_MINOR__`` ; | |
56 | for instance, for :samp:`3.1` the macros have the values :samp:`3` and | |
57 | :samp:`1`, respectively. | |
58 | ||
59 | .. option:: -mmainkernel | |
60 | ||
61 | Link in code for a __main kernel. This is for stand-alone instead of | |
62 | offloading execution. | |
63 | ||
64 | .. option:: -moptimize | |
65 | ||
66 | Apply partitioned execution optimizations. This is the default when any | |
67 | level of optimization is selected. | |
68 | ||
69 | .. option:: -msoft-stack | |
70 | ||
71 | Generate code that does not use ``.local`` memory | |
72 | directly for stack storage. Instead, a per-warp stack pointer is | |
73 | maintained explicitly. This enables variable-length stack allocation (with | |
74 | variable-length arrays or ``alloca``), and when global memory is used for | |
75 | underlying storage, makes it possible to access automatic variables from other | |
76 | threads, or with atomic instructions. This code generation variant is used | |
77 | for OpenMP offloading, but the option is exposed on its own for the purpose | |
78 | of testing the compiler; to generate code suitable for linking into programs | |
79 | using OpenMP offloading, use option :option:`-mgomp`. | |
80 | ||
81 | .. option:: -muniform-simt | |
82 | ||
83 | Switch to code generation variant that allows to execute all threads in each | |
84 | warp, while maintaining memory state and side effects as if only one thread | |
85 | in each warp was active outside of OpenMP SIMD regions. All atomic operations | |
86 | and calls to runtime (malloc, free, vprintf) are conditionally executed (iff | |
87 | current lane index equals the master lane index), and the register being | |
88 | assigned is copied via a shuffle instruction from the master lane. Outside of | |
89 | SIMD regions lane 0 is the master; inside, each thread sees itself as the | |
90 | master. Shared memory array ``int __nvptx_uni[]`` stores all-zeros or | |
91 | all-ones bitmasks for each warp, indicating current mode (0 outside of SIMD | |
92 | regions). Each thread can bitwise-and the bitmask at position ``tid.y`` | |
93 | with current lane index to compute the master lane index. | |
94 | ||
95 | .. option:: -mgomp | |
96 | ||
97 | Generate code for use in OpenMP offloading: enables :option:`-msoft-stack` and | |
3ed1b4ce | 98 | :option:`-muniform-simt` options, and selects corresponding multilib variant. |