]> git.ipfire.org Git - thirdparty/gcc.git/blob - gcc/doc/gcc/extensions-to-the-c-language-family/using-vector-instructions-through-built-in-functions.rst
sphinx: add missing trailing newline
[thirdparty/gcc.git] / gcc / doc / gcc / extensions-to-the-c-language-family / using-vector-instructions-through-built-in-functions.rst
1 ..
2 Copyright 1988-2022 Free Software Foundation, Inc.
3 This is part of the GCC manual.
4 For copying conditions, see the copyright.rst file.
5
6 .. _vector-extensions:
7
8 Using Vector Instructions through Built-in Functions
9 ****************************************************
10
11 On some targets, the instruction set contains SIMD vector instructions which
12 operate on multiple values contained in one large register at the same time.
13 For example, on the x86 the MMX, 3DNow! and SSE extensions can be used
14 this way.
15
16 The first step in using these extensions is to provide the necessary data
17 types. This should be done using an appropriate ``typedef`` :
18
19 .. code-block:: c++
20
21 typedef int v4si __attribute__ ((vector_size (16)));
22
23 The ``int`` type specifies the :dfn:`base type`, while the attribute specifies
24 the vector size for the variable, measured in bytes. For example, the
25 declaration above causes the compiler to set the mode for the ``v4si``
26 type to be 16 bytes wide and divided into ``int`` sized units. For
27 a 32-bit ``int`` this means a vector of 4 units of 4 bytes, and the
28 corresponding mode of ``foo`` is V4SI.
29
30 The ``vector_size`` attribute is only applicable to integral and
31 floating scalars, although arrays, pointers, and function return values
32 are allowed in conjunction with this construct. Only sizes that are
33 positive power-of-two multiples of the base type size are currently allowed.
34
35 All the basic integer types can be used as base types, both as signed
36 and as unsigned: ``char``, ``short``, ``int``, ``long``,
37 ``long long``. In addition, ``float`` and ``double`` can be
38 used to build floating-point vector types.
39
40 Specifying a combination that is not valid for the current architecture
41 causes GCC to synthesize the instructions using a narrower mode.
42 For example, if you specify a variable of type ``V4SI`` and your
43 architecture does not allow for this specific SIMD type, GCC
44 produces code that uses 4 ``SIs``.
45
46 The types defined in this manner can be used with a subset of normal C
47 operations. Currently, GCC allows using the following operators
48 on these types: ``+, -, *, /, unary minus, ^, |, &, ~, %``.
49
50 The operations behave like C++ ``valarrays``. Addition is defined as
51 the addition of the corresponding elements of the operands. For
52 example, in the code below, each of the 4 elements in :samp:`{a}` is
53 added to the corresponding 4 elements in :samp:`{b}` and the resulting
54 vector is stored in :samp:`{c}`.
55
56 .. code-block:: c++
57
58 typedef int v4si __attribute__ ((vector_size (16)));
59
60 v4si a, b, c;
61
62 c = a + b;
63
64 Subtraction, multiplication, division, and the logical operations
65 operate in a similar manner. Likewise, the result of using the unary
66 minus or complement operators on a vector type is a vector whose
67 elements are the negative or complemented values of the corresponding
68 elements in the operand.
69
70 It is possible to use shifting operators ``<<``, ``>>`` on
71 integer-type vectors. The operation is defined as following: ``{a0,
72 a1, ..., an} >> {b0, b1, ..., bn} == {a0 >> b0, a1 >> b1,
73 ..., an >> bn}``. Vector operands must have the same number of
74 elements.
75
76 For convenience, it is allowed to use a binary vector operation
77 where one operand is a scalar. In that case the compiler transforms
78 the scalar operand into a vector where each element is the scalar from
79 the operation. The transformation happens only if the scalar could be
80 safely converted to the vector-element type.
81 Consider the following code.
82
83 .. code-block:: c++
84
85 typedef int v4si __attribute__ ((vector_size (16)));
86
87 v4si a, b, c;
88 long l;
89
90 a = b + 1; /* a = b + {1,1,1,1}; */
91 a = 2 * b; /* a = {2,2,2,2} * b; */
92
93 a = l + a; /* Error, cannot convert long to int. */
94
95 Vectors can be subscripted as if the vector were an array with
96 the same number of elements and base type. Out of bound accesses
97 invoke undefined behavior at run time. Warnings for out of bound
98 accesses for vector subscription can be enabled with
99 :option:`-Warray-bounds`.
100
101 Vector comparison is supported with standard comparison
102 operators: ``==, !=, <, <=, >, >=``. Comparison operands can be
103 vector expressions of integer-type or real-type. Comparison between
104 integer-type vectors and real-type vectors are not supported. The
105 result of the comparison is a vector of the same width and number of
106 elements as the comparison operands with a signed integral element
107 type.
108
109 Vectors are compared element-wise producing 0 when comparison is false
110 and -1 (constant of the appropriate type where all bits are set)
111 otherwise. Consider the following example.
112
113 .. code-block:: c++
114
115 typedef int v4si __attribute__ ((vector_size (16)));
116
117 v4si a = {1,2,3,4};
118 v4si b = {3,2,1,4};
119 v4si c;
120
121 c = a > b; /* The result would be {0, 0,-1, 0} */
122 c = a == b; /* The result would be {0,-1, 0,-1} */
123
124 In C++, the ternary operator ``?:`` is available. ``a?b:c``, where
125 ``b`` and ``c`` are vectors of the same type and ``a`` is an
126 integer vector with the same number of elements of the same size as ``b``
127 and ``c``, computes all three arguments and creates a vector
128 ``{a[0]?b[0]:c[0], a[1]?b[1]:c[1], ...}``. Note that unlike in
129 OpenCL, ``a`` is thus interpreted as ``a != 0`` and not ``a < 0``.
130 As in the case of binary operations, this syntax is also accepted when
131 one of ``b`` or ``c`` is a scalar that is then transformed into a
132 vector. If both ``b`` and ``c`` are scalars and the type of
133 ``true?b:c`` has the same size as the element type of ``a``, then
134 ``b`` and ``c`` are converted to a vector type whose elements have
135 this type and with the same number of elements as ``a``.
136
137 In C++, the logic operators ``!, &&, ||`` are available for vectors.
138 ``!v`` is equivalent to ``v == 0``, ``a && b`` is equivalent to
139 ``a!=0 & b!=0`` and ``a || b`` is equivalent to ``a!=0 | b!=0``.
140 For mixed operations between a scalar ``s`` and a vector ``v``,
141 ``s && v`` is equivalent to ``s?v!=0:0`` (the evaluation is
142 short-circuit) and ``v && s`` is equivalent to ``v!=0 & (s?-1:0)``.
143
144 .. index:: __builtin_shuffle
145
146 Vector shuffling is available using functions
147 ``__builtin_shuffle (vec, mask)`` and
148 ``__builtin_shuffle (vec0, vec1, mask)``.
149 Both functions construct a permutation of elements from one or two
150 vectors and return a vector of the same type as the input vector(s).
151 The :samp:`{mask}` is an integral vector with the same width (:samp:`{W}`)
152 and element count (:samp:`{N}`) as the output vector.
153
154 The elements of the input vectors are numbered in memory ordering of
155 :samp:`{vec0}` beginning at 0 and :samp:`{vec1}` beginning at :samp:`{N}`. The
156 elements of :samp:`{mask}` are considered modulo :samp:`{N}` in the single-operand
157 case and modulo 2\* :samp:`{N}` in the two-operand case.
158
159 Consider the following example,
160
161 .. code-block:: c++
162
163 typedef int v4si __attribute__ ((vector_size (16)));
164
165 v4si a = {1,2,3,4};
166 v4si b = {5,6,7,8};
167 v4si mask1 = {0,1,1,3};
168 v4si mask2 = {0,4,2,5};
169 v4si res;
170
171 res = __builtin_shuffle (a, mask1); /* res is {1,2,2,4} */
172 res = __builtin_shuffle (a, b, mask2); /* res is {1,5,3,6} */
173
174 Note that ``__builtin_shuffle`` is intentionally semantically
175 compatible with the OpenCL ``shuffle`` and ``shuffle2`` functions.
176
177 You can declare variables and use them in function calls and returns, as
178 well as in assignments and some casts. You can specify a vector type as
179 a return type for a function. Vector types can also be used as function
180 arguments. It is possible to cast from one vector type to another,
181 provided they are of the same size (in fact, you can also cast vectors
182 to and from other datatypes of the same size).
183
184 You cannot operate between vectors of different lengths or different
185 signedness without a cast.
186
187 .. index:: __builtin_shufflevector
188
189 Vector shuffling is available using the
190 ``__builtin_shufflevector (vec1, vec2, index...)``
191 function. :samp:`{vec1}` and :samp:`{vec2}` must be expressions with
192 vector type with a compatible element type. The result of
193 ``__builtin_shufflevector`` is a vector with the same element type
194 as :samp:`{vec1}` and :samp:`{vec2}` but that has an element count equal to
195 the number of indices specified.
196
197 The :samp:`{index}` arguments are a list of integers that specify the
198 elements indices of the first two vectors that should be extracted and
199 returned in a new vector. These element indices are numbered sequentially
200 starting with the first vector, continuing into the second vector.
201 An index of -1 can be used to indicate that the corresponding element in
202 the returned vector is a don't care and can be freely chosen to optimized
203 the generated code sequence performing the shuffle operation.
204
205 Consider the following example,
206
207 .. code-block:: c++
208
209 typedef int v4si __attribute__ ((vector_size (16)));
210 typedef int v8si __attribute__ ((vector_size (32)));
211
212 v8si a = {1,-2,3,-4,5,-6,7,-8};
213 v4si b = __builtin_shufflevector (a, a, 0, 2, 4, 6); /* b is {1,3,5,7} */
214 v4si c = {-2,-4,-6,-8};
215 v8si d = __builtin_shufflevector (c, b, 4, 0, 5, 1, 6, 2, 7, 3); /* d is a */
216
217 .. index:: __builtin_convertvector
218
219 Vector conversion is available using the
220 ``__builtin_convertvector (vec, vectype)``
221 function. :samp:`{vec}` must be an expression with integral or floating
222 vector type and :samp:`{vectype}` an integral or floating vector type with the
223 same number of elements. The result has :samp:`{vectype}` type and value of
224 a C cast of every element of :samp:`{vec}` to the element type of :samp:`{vectype}`.
225
226 Consider the following example,
227
228 .. code-block:: c++
229
230 typedef int v4si __attribute__ ((vector_size (16)));
231 typedef float v4sf __attribute__ ((vector_size (16)));
232 typedef double v4df __attribute__ ((vector_size (32)));
233 typedef unsigned long long v4di __attribute__ ((vector_size (32)));
234
235 v4si a = {1,-2,3,-4};
236 v4sf b = {1.5f,-2.5f,3.f,7.f};
237 v4di c = {1ULL,5ULL,0ULL,10ULL};
238 v4sf d = __builtin_convertvector (a, v4sf); /* d is {1.f,-2.f,3.f,-4.f} */
239 /* Equivalent of:
240 v4sf d = { (float)a[0], (float)a[1], (float)a[2], (float)a[3] }; */
241 v4df e = __builtin_convertvector (a, v4df); /* e is {1.,-2.,3.,-4.} */
242 v4df f = __builtin_convertvector (b, v4df); /* f is {1.5,-2.5,3.,7.} */
243 v4si g = __builtin_convertvector (f, v4si); /* g is {1,-2,3,7} */
244 v4si h = __builtin_convertvector (c, v4si); /* h is {1,5,0,10} */
245
246 .. index:: vector types, using with x86 intrinsics
247
248 Sometimes it is desirable to write code using a mix of generic vector
249 operations (for clarity) and machine-specific vector intrinsics (to
250 access vector instructions that are not exposed via generic built-ins).
251 On x86, intrinsic functions for integer vectors typically use the same
252 vector type ``__m128i`` irrespective of how they interpret the vector,
253 making it necessary to cast their arguments and return values from/to
254 other vector types. In C, you can make use of a ``union`` type:
255
256 .. In C++ such type punning via a union is not allowed by the language
257
258 .. code-block:: c++
259
260 #include <immintrin.h>
261
262 typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
263 typedef unsigned int u32x4 __attribute__ ((vector_size (16)));
264
265 typedef union {
266 __m128i mm;
267 u8x16 u8;
268 u32x4 u32;
269 } v128;
270
271 for variables that can be used with both built-in operators and x86
272 intrinsics:
273
274 .. code-block:: c++
275
276 v128 x, y = { 0 };
277 memcpy (&x, ptr, sizeof x);
278 y.u8 += 0x80;
279 x.mm = _mm_adds_epu8 (x.mm, y.mm);
280 x.u32 &= 0xffffff;
281
282 /* Instead of a variable, a compound literal may be used to pass the
283 return value of an intrinsic call to a function expecting the union: */
284 v128 foo (v128);
285 x = foo ((v128) {_mm_adds_epu8 (x.mm, y.mm)});