gcc/doc/gcc/extensions-to-the-c-language-family/using-vector-instructions-through-built-in-functions.rst

   1 ..
   2   Copyright 1988-2022 Free Software Foundation, Inc.
   3   This is part of the GCC manual.
   4   For copying conditions, see the copyright.rst file.
   5
   6 .. _vector-extensions:
   7
   8 Using Vector Instructions through Built-in Functions
   9 ****************************************************
  10
  11 On some targets, the instruction set contains SIMD vector instructions which
  12 operate on multiple values contained in one large register at the same time.
  13 For example, on the x86 the MMX, 3DNow! and SSE extensions can be used
  14 this way.
  15
  16 The first step in using these extensions is to provide the necessary data
  17 types.  This should be done using an appropriate ``typedef`` :
  18
  19 .. code-block:: c++
  20
  21   typedef int v4si __attribute__ ((vector_size (16)));
  22
  23 The ``int`` type specifies the :dfn:`base type`, while the attribute specifies
  24 the vector size for the variable, measured in bytes.  For example, the
  25 declaration above causes the compiler to set the mode for the ``v4si``
  26 type to be 16 bytes wide and divided into ``int`` sized units.  For
  27 a 32-bit ``int`` this means a vector of 4 units of 4 bytes, and the
  28 corresponding mode of ``foo`` is V4SI.
  29
  30 The ``vector_size`` attribute is only applicable to integral and
  31 floating scalars, although arrays, pointers, and function return values
  32 are allowed in conjunction with this construct. Only sizes that are
  33 positive power-of-two multiples of the base type size are currently allowed.
  34
  35 All the basic integer types can be used as base types, both as signed
  36 and as unsigned: ``char``, ``short``, ``int``, ``long``,
  37 ``long long``.  In addition, ``float`` and ``double`` can be
  38 used to build floating-point vector types.
  39
  40 Specifying a combination that is not valid for the current architecture
  41 causes GCC to synthesize the instructions using a narrower mode.
  42 For example, if you specify a variable of type ``V4SI`` and your
  43 architecture does not allow for this specific SIMD type, GCC
  44 produces code that uses 4 ``SIs``.
  45
  46 The types defined in this manner can be used with a subset of normal C
  47 operations.  Currently, GCC allows using the following operators
  48 on these types: ``+, -, *, /, unary minus, ^, |, &, ~, %``.
  49
  50 The operations behave like C++ ``valarrays``.  Addition is defined as
  51 the addition of the corresponding elements of the operands.  For
  52 example, in the code below, each of the 4 elements in :samp:`{a}` is
  53 added to the corresponding 4 elements in :samp:`{b}` and the resulting
  54 vector is stored in :samp:`{c}`.
  55
  56 .. code-block:: c++
  57
  58   typedef int v4si __attribute__ ((vector_size (16)));
  59
  60   v4si a, b, c;
  61
  62   c = a + b;
  63
  64 Subtraction, multiplication, division, and the logical operations
  65 operate in a similar manner.  Likewise, the result of using the unary
  66 minus or complement operators on a vector type is a vector whose
  67 elements are the negative or complemented values of the corresponding
  68 elements in the operand.
  69
  70 It is possible to use shifting operators ``<<``, ``>>`` on
  71 integer-type vectors. The operation is defined as following: ``{a0,
  72 a1, ..., an} >> {b0, b1, ..., bn} == {a0 >> b0, a1 >> b1,
  73 ..., an >> bn}``. Vector operands must have the same number of
  74 elements.
  75
  76 For convenience, it is allowed to use a binary vector operation
  77 where one operand is a scalar. In that case the compiler transforms
  78 the scalar operand into a vector where each element is the scalar from
  79 the operation. The transformation happens only if the scalar could be
  80 safely converted to the vector-element type.
  81 Consider the following code.
  82
  83 .. code-block:: c++
  84
  85   typedef int v4si __attribute__ ((vector_size (16)));
  86
  87   v4si a, b, c;
  88   long l;
  89
  90   a = b + 1;    /* a = b + {1,1,1,1}; */
  91   a = 2 * b;    /* a = {2,2,2,2} * b; */
  92
  93   a = l + a;    /* Error, cannot convert long to int. */
  94
  95 Vectors can be subscripted as if the vector were an array with
  96 the same number of elements and base type.  Out of bound accesses
  97 invoke undefined behavior at run time.  Warnings for out of bound
  98 accesses for vector subscription can be enabled with
  99 :option:`-Warray-bounds`.
 100
 101 Vector comparison is supported with standard comparison
 102 operators: ``==, !=, <, <=, >, >=``. Comparison operands can be
 103 vector expressions of integer-type or real-type. Comparison between
 104 integer-type vectors and real-type vectors are not supported.  The
 105 result of the comparison is a vector of the same width and number of
 106 elements as the comparison operands with a signed integral element
 107 type.
 108
 109 Vectors are compared element-wise producing 0 when comparison is false
 110 and -1 (constant of the appropriate type where all bits are set)
 111 otherwise. Consider the following example.
 112
 113 .. code-block:: c++
 114
 115   typedef int v4si __attribute__ ((vector_size (16)));
 116
 117   v4si a = {1,2,3,4};
 118   v4si b = {3,2,1,4};
 119   v4si c;
 120
 121   c = a >  b;     /* The result would be {0, 0,-1, 0}  */
 122   c = a == b;     /* The result would be {0,-1, 0,-1}  */
 123
 124 In C++, the ternary operator ``?:`` is available. ``a?b:c``, where
 125 ``b`` and ``c`` are vectors of the same type and ``a`` is an
 126 integer vector with the same number of elements of the same size as ``b``
 127 and ``c``, computes all three arguments and creates a vector
 128 ``{a[0]?b[0]:c[0], a[1]?b[1]:c[1], ...}``.  Note that unlike in
 129 OpenCL, ``a`` is thus interpreted as ``a != 0`` and not ``a < 0``.
 130 As in the case of binary operations, this syntax is also accepted when
 131 one of ``b`` or ``c`` is a scalar that is then transformed into a
 132 vector. If both ``b`` and ``c`` are scalars and the type of
 133 ``true?b:c`` has the same size as the element type of ``a``, then
 134 ``b`` and ``c`` are converted to a vector type whose elements have
 135 this type and with the same number of elements as ``a``.
 136
 137 In C++, the logic operators ``!, &&, ||`` are available for vectors.
 138 ``!v`` is equivalent to ``v == 0``, ``a && b`` is equivalent to
 139 ``a!=0 & b!=0`` and ``a || b`` is equivalent to ``a!=0 | b!=0``.
 140 For mixed operations between a scalar ``s`` and a vector ``v``,
 141 ``s && v`` is equivalent to ``s?v!=0:0`` (the evaluation is
 142 short-circuit) and ``v && s`` is equivalent to ``v!=0 & (s?-1:0)``.
 143
 144 .. index:: __builtin_shuffle
 145
 146 Vector shuffling is available using functions
 147 ``__builtin_shuffle (vec, mask)`` and
 148 ``__builtin_shuffle (vec0, vec1, mask)``.
 149 Both functions construct a permutation of elements from one or two
 150 vectors and return a vector of the same type as the input vector(s).
 151 The :samp:`{mask}` is an integral vector with the same width (:samp:`{W}`)
 152 and element count (:samp:`{N}`) as the output vector.
 153
 154 The elements of the input vectors are numbered in memory ordering of
 155 :samp:`{vec0}` beginning at 0 and :samp:`{vec1}` beginning at :samp:`{N}`.  The
 156 elements of :samp:`{mask}` are considered modulo :samp:`{N}` in the single-operand
 157 case and modulo 2\* :samp:`{N}` in the two-operand case.
 158
 159 Consider the following example,
 160
 161 .. code-block:: c++
 162
 163   typedef int v4si __attribute__ ((vector_size (16)));
 164
 165   v4si a = {1,2,3,4};
 166   v4si b = {5,6,7,8};
 167   v4si mask1 = {0,1,1,3};
 168   v4si mask2 = {0,4,2,5};
 169   v4si res;
 170
 171   res = __builtin_shuffle (a, mask1);       /* res is {1,2,2,4}  */
 172   res = __builtin_shuffle (a, b, mask2);    /* res is {1,5,3,6}  */
 173
 174 Note that ``__builtin_shuffle`` is intentionally semantically
 175 compatible with the OpenCL ``shuffle`` and ``shuffle2`` functions.
 176
 177 You can declare variables and use them in function calls and returns, as
 178 well as in assignments and some casts.  You can specify a vector type as
 179 a return type for a function.  Vector types can also be used as function
 180 arguments.  It is possible to cast from one vector type to another,
 181 provided they are of the same size (in fact, you can also cast vectors
 182 to and from other datatypes of the same size).
 183
 184 You cannot operate between vectors of different lengths or different
 185 signedness without a cast.
 186
 187 .. index:: __builtin_shufflevector
 188
 189 Vector shuffling is available using the
 190 ``__builtin_shufflevector (vec1, vec2, index...)``
 191 function.  :samp:`{vec1}` and :samp:`{vec2}` must be expressions with
 192 vector type with a compatible element type.  The result of
 193 ``__builtin_shufflevector`` is a vector with the same element type
 194 as :samp:`{vec1}` and :samp:`{vec2}` but that has an element count equal to
 195 the number of indices specified.
 196
 197 The :samp:`{index}` arguments are a list of integers that specify the
 198 elements indices of the first two vectors that should be extracted and
 199 returned in a new vector. These element indices are numbered sequentially
 200 starting with the first vector, continuing into the second vector.
 201 An index of -1 can be used to indicate that the corresponding element in
 202 the returned vector is a don't care and can be freely chosen to optimized
 203 the generated code sequence performing the shuffle operation.
 204
 205 Consider the following example,
 206
 207 .. code-block:: c++
 208
 209   typedef int v4si __attribute__ ((vector_size (16)));
 210   typedef int v8si __attribute__ ((vector_size (32)));
 211
 212   v8si a = {1,-2,3,-4,5,-6,7,-8};
 213   v4si b = __builtin_shufflevector (a, a, 0, 2, 4, 6); /* b is {1,3,5,7} */
 214   v4si c = {-2,-4,-6,-8};
 215   v8si d = __builtin_shufflevector (c, b, 4, 0, 5, 1, 6, 2, 7, 3); /* d is a */
 216
 217 .. index:: __builtin_convertvector
 218
 219 Vector conversion is available using the
 220 ``__builtin_convertvector (vec, vectype)``
 221 function.  :samp:`{vec}` must be an expression with integral or floating
 222 vector type and :samp:`{vectype}` an integral or floating vector type with the
 223 same number of elements.  The result has :samp:`{vectype}` type and value of
 224 a C cast of every element of :samp:`{vec}` to the element type of :samp:`{vectype}`.
 225
 226 Consider the following example,
 227
 228 .. code-block:: c++
 229
 230   typedef int v4si __attribute__ ((vector_size (16)));
 231   typedef float v4sf __attribute__ ((vector_size (16)));
 232   typedef double v4df __attribute__ ((vector_size (32)));
 233   typedef unsigned long long v4di __attribute__ ((vector_size (32)));
 234
 235   v4si a = {1,-2,3,-4};
 236   v4sf b = {1.5f,-2.5f,3.f,7.f};
 237   v4di c = {1ULL,5ULL,0ULL,10ULL};
 238   v4sf d = __builtin_convertvector (a, v4sf); /* d is {1.f,-2.f,3.f,-4.f} */
 239   /* Equivalent of:
 240      v4sf d = { (float)a[0], (float)a[1], (float)a[2], (float)a[3] }; */
 241   v4df e = __builtin_convertvector (a, v4df); /* e is {1.,-2.,3.,-4.} */
 242   v4df f = __builtin_convertvector (b, v4df); /* f is {1.5,-2.5,3.,7.} */
 243   v4si g = __builtin_convertvector (f, v4si); /* g is {1,-2,3,7} */
 244   v4si h = __builtin_convertvector (c, v4si); /* h is {1,5,0,10} */
 245
 246 .. index:: vector types, using with x86 intrinsics
 247
 248 Sometimes it is desirable to write code using a mix of generic vector
 249 operations (for clarity) and machine-specific vector intrinsics (to
 250 access vector instructions that are not exposed via generic built-ins).
 251 On x86, intrinsic functions for integer vectors typically use the same
 252 vector type ``__m128i`` irrespective of how they interpret the vector,
 253 making it necessary to cast their arguments and return values from/to
 254 other vector types.  In C, you can make use of a ``union`` type:
 255
 256 .. In C++ such type punning via a union is not allowed by the language
 257
 258 .. code-block:: c++
 259
 260   #include <immintrin.h>
 261
 262   typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
 263   typedef unsigned int  u32x4 __attribute__ ((vector_size (16)));
 264
 265   typedef union {
 266           __m128i mm;
 267           u8x16   u8;
 268           u32x4   u32;
 269   } v128;
 270
 271 for variables that can be used with both built-in operators and x86
 272 intrinsics:
 273
 274 .. code-block:: c++
 275
 276   v128 x, y = { 0 };
 277   memcpy (&x, ptr, sizeof x);
 278   y.u8  += 0x80;
 279   x.mm  = _mm_adds_epu8 (x.mm, y.mm);
 280   x.u32 &= 0xffffff;
 281
 282   /* Instead of a variable, a compound literal may be used to pass the
 283      return value of an intrinsic call to a function expecting the union: */
 284   v128 foo (v128);
 285   x = foo ((v128) {_mm_adds_epu8 (x.mm, y.mm)});